Wednesday 31 May 2023

Healthcare isn't the only example of a two-tier system

Merit goods are defined as goods that are deemed to be socially desirable. Socially desirable things can often be provided by the market. However, merit goods are deemed to be somewhat problematic because either: (1) the market doesn't provide enough of the good, compared with the socially efficient (economic-welfare-maximising) quantity; or (2) the good is important, and if it was distributed by the market, some people would miss out on the minimum quantity of the good that is deemed to be necessary. Who decides what is socially desirable or what the minimum quantity of the good is? Those are political decisions. In other words, what constitutes a merit good is mostly a political decision, not an economic decision.

If merit goods are not provided in sufficient quantities, or are provided in sufficient quantities but not to everyone who we think needs them, then that suggests a role for government. That means that the government either subsidises merit goods or provides the merit goods itself. I was tempted to provide some examples that fit into each of those two categories, but it turns out that, for every example I considered (health, education, public transport, the arts) the government tends to do a mixture of both subsidising and public provision.

However, let's focus on public provision of merit goods. There is no free lunch. When the government spends on health or education, that means less spending on other things. There are trade-offs, and that makes it difficult for government to provide gold-plated goods and services to everyone. Sometimes, people just want more than the government is able to provide. That's why in New Zealand, alongside the public healthcare system, there is a growing private healthcare system, accessed mainly by those with private health insurance. This unavoidably creates a disparity in people's access to healthcare. Some people can only afford to access the public healthcare system, while others can choose to access public healthcare, or to pay extra (either directly or through health insurance) to access private healthcare.

Many people take issue with the disparity in access, arguing that it creates a two-tiered system and is therefore unfair. Take this article from The Conversation yesterday, by Elizabeth Fenton and Robin Gauld (both University of Otago):

Many seem to accept the argument that a two-tier public-private health system is not morally problematic, given most essential health services remain free to all. Some might go further and argue justice demands a two-tier system because health is only one public good the state is obliged to provide. Limiting non-essential healthcare services ensures it can meet those obligations.

The second private tier protects the liberty of those who want and can afford to purchase those services, while the first public tier focuses on meeting everyone’s needs to a sufficient level.

But the justice argument supports this conclusion only if the services and benefits provided in the first tier meet that threshold of sufficiency. Where exactly this threshold lies has been the subject of perennial debate.

Fenton and Gauld are clearly in favour of a more generous public healthcare system, which in turn would limit the necessity for the private healthcare system. Their argument is valid. It's simply a question of prioritisation of government spending (but keeping the trade-offs in mind). However, I do want to take issue with this bit from their article (aside from them saying in the quote above that health is a public good, when it isn't [*]):

When the worse-off are required to accept services below reasonable expectations of routine care (and the demonstrable harms that result), individuals are no longer in the same boat. The better-off live in a world of social goods and privileges inaccessible to the worse-off.

Why we accept this in health and not other sectors is an important question. It is hard to imagine school teachers only taking bookings months out to see parents seeking help for their troubled children, or denying entry to public schools due to limited capacity.

It is also doubtful we would accept teachers setting up private classes and consultation times to provide a timely service to those who can pay.

Fenton and Gauld need to reconsider their example of education. Education is a two-tier system as well. There are both public schools (nominally free) and private schools (with high fees). Parents can (and do) pay for private tuition for their children as well. Some of that private tuition is conducted by people who are teachers in their regular day job (a consequence of the education system paying teachers poorly). And that isn't even considering post-secondary education, where there are both public and private providers as well. 

Healthcare isn't the only example of a two-tier system. Consider transport (public transport vs. taxis or Uber), justice (legal aid vs. private lawyers), or security (police vs. private security firms). Unless there is some law that prevents the private sector from operating, any time the government is publicly providing a good or service, we'll end up with a two-tiered system, with one tier accessible to everyone, and a separate tier accessible to those who are willing and able to pay more.

*****

[*] Public goods are goods that are non-rival (meaning that one person's use of the good doesn't diminish the amount that is available for everyone else) and non-excludable (meaning that no one can be prevented from accessing the good). Healthcare doesn't meet either of those conditions (for some further explanation with other examples, see here and here). Healthcare is a private good - it is both rival and excludable. It may be a publicly-provided good (in many countries like New Zealand that have a public healthcare system), but that doesn't make it a public good.

Monday 29 May 2023

The compensating differential for university professors

I love my job. I honestly have the best job in the world [*]. Now, there is a serious problem inherent in those statements. It isn't that we shouldn't love our jobs. Every one of us deserves to do a job that we love. The problem is that, when we do a job that we love, we are willing to do so for a much lower wage than we could earn in a job that we didn't love nearly as much.

This is what economists refer to as a compensating differential. Consider two jobs that are similar in almost all ways, except that one job has some attractive non-monetary characteristics (e.g. it is a job that is pleasant, fun, and/or clean), while the other job has unattractive non-monetary characteristics (e.g. it is a job that is unpleasant, boring, and/or dirty). More people will be willing to do the first job. This leads to a higher supply of labour for that job, which leads to lower equilibrium wages. In contrast, fewer people will be willing to do the second job. This leads to a lower supply of labour for that job, which leads to higher equilibrium wages. The difference in wages between the attractive job that lots of people want to do and the unattractive job that fewer people want to do is the compensating differential.

Is there a compensating differential for university professors? According to this 2018 article by Daniel Hamermesh (Barnard College), published in the journal Economics of Education Review (ungated earlier version here), there is, and it is substantial. Using data from the American Community Survey for 2012-2016, and comparing workers with doctorates working in academia with workers with doctorates working in other occupations, Hamermesh finds that:

Comparing pay differences at various quantiles of the distributions, near the bottom of the pay distributions academics earn more than other doctorate-holders; but the differences rise steadily as we move up the earnings distributions, with academics’ pay beginning to fall below that of other doctorate-holders at the 17th percentiles of the distributions. At the 25th percentiles of the distributions the earnings advantage has turned into a disadvantage of 6%; at the medians it is 19%; and it rises to an astounding 50% disadvantage at the 95th percentiles. At the means academics receive 24% lower pay than non-academic doctorate-holders...

University professors get paid less than doctorate-holders in other occupations. The mean difference is US$29,802 (US$121,704 vs. US$91,902), and the difference at the 95th percentile is a staggering $196,594 (US$394,058 vs. US$197,464). Controlling for various demographics, the mean difference in pay is 17 percent.

This suggests an incredibly large compensating differential for being a university professor. What explains it? Hamermesh looks at the flexibility of work timing using a simple utility model calibrated to data from the American Time Use Survey, and concludes that:

Under what seem like reasonable assumptions about utility a not tiny, but also not huge part of the earnings differential can be explained by the more equal distribution of leisure across days of the week that academics enjoy.

To be more specific, time flexibility can explain no more than a quarter of the difference in pay between university professors and other doctorate-holders. Hamermesh then looks at data from a survey of 288 academic economists, which asked which aspects of their job contributes most to their enjoyment of being a professor. The survey results show that:

Freedom and novelty of research, and the satisfaction of working with young minds, are by far the most important attractions of academe, listed by 88 and 74% of survey respondents respectively. Only 36% of respondents listed time flexibility as a top-three attraction, slightly fewer than listed enjoying intellectual and social interactions with colleagues.

University professors are poorly paid compared to workers with doctorates in other occupations. The compensating differential here appears to capture how much professors value the freedom and novelty of doing their own research, and the thrill and inspiration that comes from working with students. I totally buy into that argument. I love my job.

****

[*] Important note to my employer: Nothing in this post should be taken as an indication that I don't deserve to be paid way more than my current salary.

Sunday 28 May 2023

The case against 'greedflation'

The latest economics buzzword is 'greedflation' - the idea that firms exploit inflation by raising prices to create excessive profits (for example, see here). Aside from being a cool portmanteau and inspiring new memes, I just don't see it. In fact, I've said so. I wrote an article in The Conversation about it last month, and I was interviewed on RNZ's The Detail podcast last week (see also here, and similar points picked up by the New Zealand Herald's Front Page podcast here). I was pretty clear on The Detail - I'm a greedflation skeptic. This post outlines the theoretical and practical reasons why I believe that greedflation is an illusion.

First, let's consider how firms price their products. As I teach in my ECONS101 class, we assume that firms with market power are trying to maximise profits. If the firm sells a single product at a single price-per-unit, the profit-maximising quantity is the quantity where marginal revenue is exactly equal to marginal cost. As shown in the diagram below (which assumes a constant-cost firm, a point I will return to later), the profit-maximising quantity is QM. To sell the quantity QM, the firm sets the price equal to PM, because at that price the quantity of the product that consumers want to buy is exactly equal to QM. The difference between the price PM and the firm's costs PS is the firm's mark-up. The firm's producer surplus (profit) is equal to the area of the rectangle CBDF.

Now, once the profit-maximising price is set, there is no reason for the firm to deviate from that price. If the firm raises the price above PM, then by definition their profits must decrease (because PM is the price that maximises profits, so any other price must decrease profits for the firm). So, here we have the first theoretical case against greedflation - a firm that is already profit-maximising has no incentive to increase prices, because they are already maximising profits. It makes no sense for the firm to try and 'trick' consumers into paying a higher price, because consumers would buy less of the good.

Now, consider what would cause the firm to change the price it sets. First, a firm would likely change prices if its costs change. This is shown in the diagram below. If the firm's costs decrease from MC0 to MC1, then the profit-maximising price decreases from P0 to P1. This also works in reverse - if the firm's costs increase, then the profit-maximising price increases. This would not be 'greedflation'. Most proponents of the idea agree that a firm that is passing on higher costs to the consumer is not exploiting the consumer. Moreover, the research of Nobel Prize winner Daniel Kahneman and others shows that consumers see higher prices as 'fair' when they are driven by higher costs.

Second, a firm would likely change prices if demand changes. This is shown in the diagram below. When demand is shown by the curve D0, the profit-maximising price is P0, but when demand increases to D1, the profit-maximising price increases to P1, even though costs are the same. Is this 'greedflation'? Perhaps, if the firm tries to hide its increase in price behind a smokescreen of 'it's because of inflation'. Moreover, Kahneman's research (noted above) does show that consumers find price increases that arise from demand changes to be unfair. On the other hand, economists expect firms to increase prices when demand is high. It's how markets work on a routine basis, and isn't unique to a time of higher-than-usual inflation.

Third, a firm would likely change prices if consumers' price elasticity of demand changes. Price elasticity of demand is the consumer's responsiveness to a change in price. When demand is more price elastic, the demand curve is flatter, and the firm's optimal mark-up is lower. This is shown in the diagram below. If the demand curve is D0, then the profit-maximising price is P0, but if the demand curve was more elastic (D1), then the profit-maximising price is lower (P1), even though costs are the same. Why would demand become less elastic? There are many factors that affect the price elasticity of demand. However, most of them are fairly static and don't change much. The availability of substitutes, though, can change. When there are fewer substitutes available, as would happen if competition in the market decreased, that would make demand less price elastic, and raise the profit-maximising price for remaining firms in the market. Is this 'greedflation'? Again, perhaps, if the firm tries to hide its increase in price behind a smokescreen of 'it's because of inflation'. But I'd still argue that this is a routine consequence of a decrease in competition, and not unique to a time of higher-than-usual inflation. This explains the case of Air New Zealand, for example, which has been raised as an example of 'greedflation' in New Zealand. Jetstar wound down its services during the pandemic, reducing competition in the market for domestic air travel, and not surprisingly, Air New Zealand raised domestic airfares.

Ok, so we've established the conditions where firms with market power would increase prices (higher costs, higher demand, lower competition). However, as I note in my ECONS101 class, pricing in the real world is not as simple as that shown in the diagrams above. First, firms often don't know for sure what their demand curve is, and so they won't know for sure what their marginal revenue curve is, and so setting the price at the quantity where marginal revenue is exactly equal to marginal cost is difficult in practice. However, that doesn't mean that firms can't set a price at all. It just means that they can't always do it perfectly. A good manager has a fundamental understanding of their market, which means that they understand in relative terms how price elastic or price inelastic the demand for their product is. They can use that fundamental understanding to set the mark-up. They won't get it perfectly correct, but they shouldn't systematically get it wrong (if they did, they wouldn't be a manager for long). Then having set the price using their fundamental understanding, they adjust the price occasionally to take account of changing costs or changing market conditions. For example, they raise prices if their costs increase, or they lower prices if a new competitor opens down the street from their store.

Second, firms don't change their prices every time that market conditions change. That's because of menu costs - literally, the costs associated with changing prices. Menu costs may be low if all they require is changing some settings in the point-of-sale system, but can be higher if they require printing and attaching new price labels. Firms prefer to avoid these costs, as well as avoiding the uncertainty for consumers that constantly changing prices cause, so they tend to increase prices only infrequently.

Both of those real-world pricing problems mean that firms will often increase their prices by more than is justified by a strict accounting of an increase in their costs. Perhaps they are 'catching up' on an increase in costs from a few months earlier. For example, say that the firm has costs that go up from $10 to $12 to $15 from Month 1 to Month 2 to Month 3, but they keep their price the same at $20 from Month 1 to Month 2, and then raise it to $30 in Month 3. If you were looking for 'greedflation', you might then see evidence in favour of it between Month 2 and Month 3, when price increased by 50% but costs only increased by 25%. However, you are ignoring the previous month, when costs increased but the firm didn't change their price.

So, that's the theoretical and practical cases against 'greedflation'. I simply don't think that firms are hiding price rises behind a smokescreen of high inflation. There isn't much incentive for them to do so. Is there empirical evidence to support the idea of 'greedflation'? Quite the contrary. In the latest issue of AEA Papers and Proceedings, this article (ungated earlier version here) by Christopher Conlon (New York University) and co-authors looks at the relationship between changes in firms' mark-ups and changes in prices (as measured by the producer price index, deflated by the consumer price index). They note that:

Our starting point is the observation of Syverson (2019) that for markups defined as price over marginal costs (μ ≡ P / MC), an approximation provides

(1) ΔP ≈ Δμ + ΔMC.

Therefore, increases in markups should yield increases in prices unless they are offset by marginal costs changes.

Using annual data from CompuStat on revenue and cost-of-goods-sold for nearly 8000 firms, and covering the period from 1980 to 2018, as well as quarterly data from 2018 to 2022, Conlon et al. find that their data:

...do not reveal a strong correlation between markup and price changes during the sample periods.

In other words, price changes are not driven by changes in markups, which leaves changes in marginal costs as the explanation. In other words, there is no evidence of 'greedflation'. However, that doesn't mean that changes in competition are implicated solely, either. Conlon et al. note that:

A second explanation, proposed by Syverson (2019), is that if cost of goods is more similar to average costs than marginal costs, then we need to also adjust for the scale elasticity AC/MC,:

(3) μ ≡ P/MC = P/AC × AC/MC

So, if prices are increasing at the same rate as mark-ups, then that could be because average costs are increasing, or it could be because the ratio of average costs to marginal costs is increasing. That would happen if there were a re-balancing of costs from variable costs (MC) to fixed costs (a component of AC). Conlon offer some evidence from other studies that supports this argument. However, that's not 'greedflation' either.

The case against 'greedflation' is strong. Just because it makes a nice meme, that doesn't mean that it is true.

Friday 26 May 2023

Is good research a substitute or complement for good teaching?

University lecturers engage in two main activities: teaching, and research. Some people believe that the two activities are complements. For example, higher-quality research is associated with a better or deeper understanding of the discipline, which can then be passed onto students with higher-quality teaching. On the other hand, teaching and research may be substitutes. Academics have limited time to devote to each activity, and naturally spending more time on one means less time devoted to the other. That would suggest that higher-quality research would be associated with lower-quality teaching.

So, which is it - complements or substitutes? Past studies I've written about (see here and here) haven't provided good evidence either way. So, I was interested to read this 2018 article by Ali Palali (CPB Netherlands Bureau for Economic Policy Analysis) and co-authors, published in the journal Economics of Education Review (ungated earlier version here). They first provided a much more thorough explanation than mine above of the mechanisms that might relate teaching and research:

The first type of mechanisms suggests a positive relationship between research quality and teaching quality via complementarity between skills... Conducting research can both enhance proficiency of the teacher in the subject and keep him up-to-date with the latest developments in the discipline. As a result, research activities have a positive impact on teaching quality...

The second set of mechanisms suggests a negative relationship between research quality and teaching quality. Both research and teaching activities require investment of time and effort. Time and effort spend on research reduces the amount of time and effort that can be spent on teaching, unless some activity benefits both research and teaching (e.g. reading a scientific paper can simultaneously contribute to research ideas and to teaching preparation)... A negative relation between research and teaching can also result if (contrary to the first set of mechanisms) teaching and research require a different set of skills. If research requires more specific skills (e.g. synthesis, deduction) than teaching (e.g. communication, mentoring), this can lead to disparities between skill transfers.

Palali et al. used data on student performance from over 9000 students in the BA and MA programmes at Maastrict University in the Netherlands over the period from 2008 to 2013, essentially identifying the relationship between student performance (measured by grades) and the research quality of their teachers. This approach is valid because, after students have chosen their courses:

The Scheduling Department at SBE allocates students into tutorial groups using a computer program. Once the online registration is closed, all students taking the same course are randomly assigned to tutorial groups by a computer program. Subsequently, tutorial teachers are randomly assigned to tutorial groups within a course...

The randomisation ensures that good students are not systematically paired with good teachers (or good researchers, for that matter), and means that the results of the analysis are plausibly causal, rather than simply correlations. They measure research quality using research publication, which in the first instance is a dummy variable that captures whether each academic has any research publications in the previous four years, or alternatively a measure of the total number of research publications in the previous four years. They also use measures of quality based on a dummy variable for whether each academic has any publications in journals rated 'A', 'B', or 'C' (in a classification used at Maastricht University). In their analysis, Palali et al. find that:

Only for master students a positive effect of this research quality measure is found on student grades. Students of teachers with at least one publication the past 4 years have on average 0.35 (in a scale of 0–10) higher grades than those of teachers with no publications in the past 4 years...

...the coefficient estimate for the total number of publications in the last 4 years shows that the total number of publications has no effect on student performance.

Those two measures mostly ignore research quality. However, moving onto their other measures, Palali et al. find that:

The coefficient estimate for master students shows that there is a significant positive effect on student grades for master students. Having a teacher with at least one A level publication in the last four years in associated with a 0.43 higher student grade. This suggests that in master programs students taught by teachers with high quality publications perform better, but students of teachers with many publications do not. Thus, quality seems to be more important than quantity.

So, overall, the results suggest that research and teaching are complements, but only for postgraduate (Masters-level) study. Why might that be? Palali et al. suggest that:

Most of the courses in bachelor programs are mandatory courses at the introductory level. Master courses, on the other hand, are more often elective courses, and are more specialized courses on a specific topic, and followed by students that are more interested and motivated. It is also generally the case that teachers give special topic courses which primarily focus on their field of interest. This can increase the effects of skill transfers and the effects of interactions between teachers and students.

On the other hand, Palali et al. also find little evidence for any relationship between research quality and student evaluations of teaching (also such evaluations have their own problems - see here and the links at the bottom of that post).

So, should we conclude that research and teaching are complements, or that there is no relationship between them? Before we conclude, we need to note that there is a problem with this analysis. Higher-quality teaching should manifest in students doing better in their subsequent studies, not just in the particular course they are studying in at the time. Higher student grades in courses taught by better researchers could simply mean that better researchers grade their students more generously (perhaps so they don't have to spend time on student complaints, and can therefore devote more time to high-quality research). The effect on future grades is relatively easy to check for (such as in studies on teacher value-added, see here). When Palali et al. look at future grades, they find that:

...there are no dynamic effects. Although coefficient estimates are positive, they are small in magnitude and insignificant.

So overall, it remains difficult to say whether good research and good teaching are complements or substitutes.

Read more:

Thursday 25 May 2023

The minimum legal purchase age for alcohol and crime

On 1 December 1999, New Zealand lowered the minimum legal purchase age (MLPA) for alcohol from 20 years to 18 years. That set in place an interesting natural experiment for how legal access to alcohol affects young people's behaviour. I've blogged before about research showing the effects on hospitalisations (where there appears to have been an increase, but it's unclear how large or otherwise the increase was) and alcohol-involved motor vehicle crashes (where the evidence is weak). What about the effects on youth crime in New Zealand? After all, the research I referred to in this post a couple of months ago seemed to demonstrate about a 15.7 percent increase in crime at the age where alcohol becomes legally available (age 16 in Germany).

In a recent article published in the journal Oxford Bulletin of Economics and Statistics (open access), Kabir Dasgupta, Alexander Plum, and Christopher Erwin (all Auckland University of Technology) looked at how youth crime changed when the MLPA decreased (see also their non-technical summary in The Conversation). They use administrative data from Statistics New Zealand's Integrated Data Infrastructure, and looked at monthly periods for young people in 1994-1998 (when the MLPA was 20 years) and 2014-2018 (when the MLPA was 18 years). Specifically, they look at what happens to crime rates before and after age 20 for both groups, using a regression discontinuity design. They also employ a difference-in-differences approach, comparing youth aged 18-19 years with youth aged 20-22 years, before and after the change in MLPA.

Their first method (regression discontinuity) is probably best summarised by Figure 1 from the paper. First, the results for when the MLPA was 20 years (in 1994-1998):

Notice that there is a big decrease in court charges (their measure of crime) at age 20, but that drop is entirely driven by a decrease in age-dependent traffic offenses (which are mostly violations of drink-driving laws, where the breath alcohol limit differs for drivers younger than age 20). Indeed, their regression analysis shows that:

Overall, focusing on the more comparable age-independent crime indicators, we do not find any significant change in alcohol-related crimes when the MLPA was 20.

As for the more recent sample, when the MLPA was 18 years (in 2014-2018):

The pattern looks fairly similar, with the same drop in crime at age 20, driven by a large decrease in age-dependent traffic offenses. At first glance, there doesn't appear to be an up-tick in crime at age 18 (which is the MLPA in this sample). However, based on the regression analysis Dasgupta note that:

At the relevant MLPA threshold, we do not find any statistically discernible change either in the overall measure or in the age-independent measure of alcohol-related crime...

Looking at alcohol-induced traffic convictions, we do not observe any significant effect for the age-independent measure, which excludes BBAC limit violations. There is however a statistically significant (at the 1% level) jump in age-dependent traffic convictions at the 18-year age threshold. Specifically, an increase of approximately 7.5 convictions per 100,000 population ... indicates that gaining alcohol purchasing rights triggers a rise in infractions of mandated BBAC limits applicable to youth below 20. Compared with the sample mean of groups just under the relevant MLPA, the linear RD coefficient represents a 27 % (7.5×100/27.4) increase in alcohol-induced age-dependent traffic convictions.

So, there is a small increase in traffic violations at age 18 in this sample, which is likely to be mostly an increase in age-related drink driving offences. In further analysis, Dasgupta et al. show that:

...there is a significant increase in age-dependent traffic convictions among individuals residing in non-urban (rural) locations and those living in socio-economically more deprived neighbourhoods.

So, the results are concentrated among youth who live in poorer and more rural locations. However, before we get too carried away, we need to recognise that Dasgupta et al. are comparing a sample from 1994-1998 with a sample from 2014-2018. That wouldn't be my first choice of comparison group, not least because youth drinking norms have been changing over time (see here and here). It would be better to compare groups that are closer in time, and that's what Dasgupta et al. do in their difference-in-differences analysis (which they only present as a bit of a robustness check late in the paper). In that analysis, they report that:

...we do not find any significant evidence of an increase in criminal activities in the post-1999 period for young individuals who gained purchasing rights for the first time.

Dasgupta et al. don't actually show the results of that analysis in the paper, which is a bit disappointing, as I would have found it more compelling than the results they actually did report. Overall, they conclude that their results show:

...little evidence that late adolescents commit more alcohol-related crimes upon crossing over the legal purchasing age in New Zealand.

That's not quite true for when the drinking age is 18, but if most of the increase in crime at age 18 is drink driving violations that relate to a lower breath alcohol limit for those aged under 18, it isn't clear that there's a strong case for an increase in harm. However, although statistically insignificant, there is an apparent increase in crime overall at age 18 in the regression discontinuity results. It is relatively small, at 4 convictions per 100,000 population, or a 7.6 percent increase. That's not insubstantial. I'd be much more cautious about claiming that there is 'little evidence' of an increase in crime. With the prevalence of youth drinking way down, it's not at all clear how much statistical power an analysis that looks at all young people has. It would be interesting to see what the results would be, if limited only to youth who drink.

Read more:

Tuesday 23 May 2023

Communism, and alcohol consumption in Eastern and Southern Europe

Russia has one of the world's highest alcohol-related death rates. In fact, if you look at deaths from alcohol use disorders in Europe, there is a clear East-West gradient. Death rates are higher in the East, and lower in the West (source):

What causes this difference in death rates? The obvious culprit is alcohol consumption (duh!). But when you look at alcohol consumption per capita, it is actually slightly higher in Western Europe than in Eastern Europe:

What gives? Is there something different in the way that people are drinking in Eastern Europe from Western Europe that explains higher death rates in the East than the West, even though there is more drinking in the East?

That is what I initially hoped I would learn from this 2018 article by Gintare Malisauskaite and Alexander Klein (both University of Kent), published in the Journal of Comparative Economics (ungated version here). They look at the role of exposure to Communism on alcohol consumption (but not alcohol-related mortality). Their premise may be a little bit faulty though, since they start from a suggestion that drinking is higher in the East, when the data above show the opposite (charitably, I guess we could say this fact is contested).

Malisauskaite and Klein use data on around 36,000 people from the European Health Interview Survey (EHIS), collected between 2006 and 2009. The number of countries included is somewhat limited:

Due to the availability of alcohol consumption data, countries included in our estimations were: Cyprus, Greece, Malta (Western) and Bulgaria, Czech Republic, Hungary, Poland, Romania, Slovenia, Slovakia (Eastern).

Do Cyprus, Greece, and Malta count as Western Europe? I guess, maybe? Personally, I'd probably refer to them as Mediterranean or Southern Europe (as I did in the title to this post), rather than Western Europe. Looking at the map of alcohol consumption, those three countries definitely have lower alcohol consumption than Western Europe, and Eastern Europe as well. The big drinking Western European countries (Germany, Ireland, and France) are all excluded. That might be the basis for Malisauskaite and Klein's initial claim of higher drinking in Eastern Europe.

Anyway, Malisauskaite and Klein look at the relationship between alcohol consumption (and binge drinking) and exposure to Communism. They measure exposure to Communism in two ways:

...one indicating whether an individual lived in the Eastern Bloc between age 18 and 25, the other the number of years lived in a communist regime.

Looking at men and women separately, they find that:

...both variables capturing exposure increase the probability that women consume alcohol more frequently. In the case of men, the number of years spent under communism has a sizeable significant effect. Overall, however, the effect of communism on alcohol consumption frequency is larger for women. Binge drinking, on the other hand, shows interesting gender differences: communist regimes have no effect on women's binge drinking behaviour, but affect men. Binge drinking results for men suggest that exposure during formative years could play a more important role than number of years spent in the regime.

The effects are pretty small. For example, growing up in a Communist country increases the probability of drinking every day by 0.4 percent for both women and men. Certainly, that's not enough to explain much of the mortality gap, even setting aside the unusual comparison group. I'm not sure that this research really adds much to our understanding of these differences, especially since, as Malisauskaite and Klein acknowledge in their conclusion:

...we cannot pinpoint the exact reason why or the way in which experiencing communist regimes could have influenced drinking norms...

So, we really don't know how much exposure to Communism contributes to drinking. A more appealing approach would be to look within, rather than between, countries. For example, research could compare Germans on either side of the border, before and after the fall of the Berlin Wall. That abstracts somewhat from underlying cultural differences, although there are other differences between East and West Germany that would be difficult to control for. It's not perfect, but the results might be more defendable. If only those data were available.

Monday 22 May 2023

New economic ideas, and the rift between academic economists and government economists

On NZAE's Asymmetric Information blog yesterday, Dennis Wesselbaum reported on the latest (sixth) NZAE member survey. The theme of the survey was views on Kate Raworth's Doughnut Economics (which I reviewed here), Mariana Mazzucato's Mission Economy (which I reviewed here), and the circular economy. The results are eye-opening:

One takeaway from this survey is that academic economists disagreed that these concepts are improving economic policy analyses, and that these concepts should be taught as part of the economics curriculum at all. In the words of one respondent: “As an economics lecturer, I am loathe to introduce non-science based concepts into the curriculum. (This is the same reason I/we are reluctant to teach Modern Monetary Theory - another popular set of non-scientific econ "meme" theories)”.

However, the exact opposite is true for government economists. While the survey is non-representative for either group, this suggests an important and potentially dangerous divergence of views about the usefulness of these fringe concepts.

I'm not sure that I would go so far as to label the difference as 'dangerous', but certainly noteworthy. As one of the academic economist respondents to the survey, I thought it worth adding a bit of additional context.

Consider Question 2 from the survey: "To the extent that [each of Doughnut Economics, Mission-based economics and the Circular Economy] has been incorporated into economic policy analysis by Ministries and Agencies, it has improved the quality of that analysis." Here's the resulting graphs, summarising the survey results for that question:


On the right-hand graphs, the purple bars are academic economists' views, and it is clear that there is a strong belief that each of doughnut economics, mission economy, and the circular economy have reduced the quality of economic policy analysis. Mission economy is viewed a less negatively than the other two, and that is pretty much how I ranked them in my responses. For the government economists (the green bars in the graph), uncertain was the mode response. Aside from being uncertain, government economists tend to have positive views about the impact of these ideas on the quality of economic policy analysis.

This divergence of views makes me wonder about the underlying cause. There is probably little difference in the undergraduate economics training between government economists and academic economists, but academic economists are more likely to have a PhD. However, it seems unlikely that the PhD makes that much of a difference. In what other ways are academic economists and government economists different? Perhaps academic economists have a generally more sceptical or critical viewpoint of new ideas that do not yet have robust empirical support (which is arguably the case for all three ideas)? Or, perhaps academic economists, stuck in their ivory towers, are simply out of touch with the latest ideas? Either of those characterisations of academic economists could be accurate. On the other hand, perhaps government economists are more likely to 'toe the party line', supporting the latest fads or fashions that the government is in favour of? Or perhaps there is a survivorship bias, in that government economists who disagree with the ideas that are currently in favour depart government jobs for the private sector (or academia)? Either of those characterisations of government economists could be accurate. Or, perhaps, a mixture of all of those characterisations explains the difference in views between academic economists and government economists - that would be my guess.

We don't know the cause of the differences, but there is at least a little bit of support for the argument that academic economists are generally more sceptical about new ideas:

The final question asked whether these concepts should be taught as part of the core syllabus in economics... We again find substantial differences between academia and government respondents: academics tend to disagree, while government economists tend to agree that these concepts should be taught.

Even if we don't believe in these ideas, there is something to be said for at least exposing students to them. Students should know that these ideas exist, and are currently influential in policy circles. If we want our students to get good government economic policy jobs, and these ideas are important for economists in those jobs to understand, we should at least be teaching our students to understand them. And we should be teaching students how to critically examine these ideas, so that students (and future government economists) have realistic expectations about what should be contributing to sensible economic policy analysis.

[Update: Eric Crampton makes many of the same points here]

Sunday 21 May 2023

ChatGPT passed the Test of Understanding of College Economics

You have probably seen the news that ChatGPT has passed law exams in Minnesota, MBA exams in Pennsylvania, or the US medical licensing exam. Business Insider even provided a list back in March of tests and exams that ChatGPT had been able to pass. High school level economics (or, rather, AP microeconomics and AP macroeconomics) were on the list. Up to now though, university-level economics hadn't made the list.

Based on this article, by Wayne Geerling, Dirk Mateer (both University of Texas at Austin), Jadrian Wooten (Virginia Polytechnic Institute), and Nikhil Damodaran (OP Jindal Global University), forthcoming in the journal The American Economist (open access), that is about to change. They tested ChatGPT using version 4 of the Test of Understanding of College Economics, a widely used multiple-choice test that covers both microeconomics and macroeconomics (technically, they are separate tests, but they can be completed together).

Geerling et al. inputted the multiple-choice questions into ChatGPT and coded whether its response was correct or not (if ChatGPT gave more than one answer, it was marked as incorrect). And ChatGPT did incredibly well, in line with the results from the other tests and exams noted above:

In our trial, ChatGPT answered 19 of 30 microeconomics questions correctly and 26 of 30 macroeconomics questions correctly, ranking in the 91st and 99th percentile, respectively.

Geerling et al. title their article: "ChatGPT has Aced the Test of Understanding in College Economics: Now What?". I'm not sure I would go so far as to say that ChatGPT 'aced' the test, as it did get several questions wrong (especially in microeconomics). However, no doubt ChatGPT will improve, and it would be interesting to see how GPT-4 would go, not least because it can handle visual inputs.

The question, "now what?" is important for teachers and lecturers everywhere. What do we do when ChatGPT can answer multiple-choice economics questions better than the average student? Geerling et al. offer only a few suggestions:

The emergence of ChatGPT has raised fears about widespread cheating on unproctored exams and other assignments. The short-term solution for many educators involves returning to in-person, proctored assessments...

Beyond this back to the future approach, there are other techniques that can be utilized in an online environment. Assessments that are time-constrained reward students who know the material, while others who do not know the material as well search their notes, ask their classmates, and seek answers through any means (including ChatGPT). The time spent searching means that they cannot complete as many questions, even if they are successful in obtaining the information...

One popular recommendation among the teaching community so far has been to produce ChatGPT responses with errors and have students work in small groups to identify and correct those errors. In essence, students are asked to “fact check” the system to ensure that the responses are accurate...

I think we're only just beginning to understand what is possible in this space, and how teachers and students interact with large language models is naturally going to evolve over time. The best uses of AI for teaching and learning probably haven't even been discovered yet. Moreover, as Geerling et al. note in their conclusion:

It is important to note that ChatGPT is not the only disruptive technology in education. The advent of artificial intelligence in education is a reality that cannot be ignored, and it is time to embrace the new era with innovative and effective assessment strategies.

Indeed.

[HT: Mark Johnston at the Econfix blog]

Read more:

Saturday 20 May 2023

How not to analyse the relationship between climate and international migration

I've done research before on the relationship between climate and migration (see this post, and the paper published here, or ungated here). So, I was really interested to read this new article by Dennis Wesselbaum (University of Otago), published in the journal Letters in Spatial and Resource Sciences (open access). Wesselbaum uses data on migration flows from 198 countries to 16 OECD countries, along with temperature data from the Berkeley Earth database, and weather-related disasters data from the EM-DAT (international disasters) database. Controlling also for GDP, population, political freedom, life expectancy, and share of agricultural land, he finds that:

...temperature, but not weather-related disasters, have a significant direct effect on migration in our sample. Temperature has a smaller effect on migration towards OECD countries in Asia compared to Europe, Africa, and North America. For disasters, we only find a stronger effect on migration in Asia compared to Africa. Temperature matters in most regions while disasters do not.

However, as the Economics Discussion Group students and I discussed in our most recent session, there are two key statistical problems with Wesselbaum's analysis. The first is the way that migration flows equal to zero (of which there are likely to be many) are dealt with. Because the dependent variable is the log of migration, and the log of zero is undefined, Wesselbaum deals with this by "adding one to all flows". That creates a problem of bias, as I noted in this recent post. Most migration researchers have instead adopted the Poisson pseudo-maximum likelihood (PPML) approach (see this working paper, for example), as it not only copes with zero values, but also deals with over-dispersion.

The second issue is likely to be more problematic. The three key variables in the analysis (migration, temperature variation, and weather-related disasters) are all trended over time. When you run an analysis with a long time-series (or, as in this case, a long panel dataset), then time trends in the variables can lead to spurious correlations. That's the reason why per capita cheese consumption is highly correlated with the number of deaths by bedsheet entanglement:


Two variables that are both trended over time will tend to look like they are closely correlated, even when a change in one of the variables does not cause a change in the other. Even when you use more complicated statistical methods, this remains a problem. To see why that may be a problem here, consider Figures 1-3 from Wesselbaum's paper:



Notice how all three variables have an upward trend. Economists refer to these time series as being non-stationary (which essentially means that the mean value of the variable is not constant over time). That doesn't mean for certain that there are problems in Wesselbaum's analysis, but it does mean that he should have tested for non-stationarity in the variables. If time series variables are found to be non-stationary, a simple solution can be to take first-differences (so that each variable would then be the difference between its value at time t, and its value at time t-1). Since Wesselbaum doesn't report the tests for stationarity, we have no way of knowing how serious the problems are, and the risk is that the correlation he identifies is simply spurious, and driven entirely by the time trends in the data.

This is not the way to analyse these data. However, it does open an opportunity for a good Honours or Masters student to replicate the analysis with a better approach.

Read more:

Friday 19 May 2023

Robert Lucas, 1937-2023

This week we lost another of the 20th Century's greatest economists, Robert Lucas. Lucas won the Nobel Prize in 1995, "for having developed and applied the hypothesis of rational expectations, and thereby having transformed macroeconomic analysis and deepened our understanding of economic policy". If anything, that radically understates the contribution of Lucas to the study of macroeconomics, and to economics more generally.

Lucas will become familiar to my ECONS101 students in the next couple of weeks, or rather his critique of the Phillips Curve will become familiar to them. Now, we might quibble about the 'rational' part of rational expectations theory, but regardless the addition of explicit consideration of expectations led to a revolution in macroeconomic modelling. The importance of that contribution, along with the many other changes that Lucas brought about in the way that economists think about an model the macroeconomy, are explored in this blog article by John Cochrane (see Cochrane's more personal reflections here as well).

Cochrane labels Lucas "the most important macroeconomist of the 20th century". He is not alone in this high praise. The Economist described Lucas in an article yesterday as "a giant of macroeconomics", while David Henderson in the Wall Street Journal called him "a giant in the field". A number of tributes will no doubt be written over the coming days and weeks. The New York Times has a nice obituary, and Tyler Cowen shared his thoughts in this Bloomberg article (paywalled). Lucas had a huge impact on economics and economists. He will be missed.

Wednesday 17 May 2023

The impact of generative AI on contact centre work

I've written a couple of times about the impact of ChatGPT on the labour market (see here and here). Both times, I noted that we needed to investigate real-world cases of the impact of large language models on particular jobs. To understand why that is important, it is worth first considering why we can't be sure about the impact of a particular technology on employment.

New production technologies (by which, I mean any technology that is used by workers) make workers more productive. That could have one of two effects on employment among the workers affected by the technology. On the one hand, more productive workers generate more production for their employer. It increases the marginal product of labour (the amount of production that the marginal worker generates, or the amount of additional production that the employer receives by employing one more worker [*]). Assuming that doesn't affect the price of the output that the workers produce, making workers more productive increases the value of the marginal product of labour (which is the marginal product of labour multiplied by the value of the output produced [**]). Since workers are generating more value for the employer, the employer would want to employ more of them, and keep adding additional workers until the value of the marginal product of labour is equal to the wage. So, in this case, new technology increases employment - technology is labour-augmenting.

On the other hand, if the employer has a fixed amount of production to generate, then have more productive workers means that fewer workers are required in order to complete all of the production. In that case, new technology decreases employment - technology is labour-replacing. However, displaced workers may find employment elsewhere, and the technology may create new job opportunities. So, even if technology is labour-replacing, there is no certainty that it reduces employment overall.

Is ChatGPT labour-augmenting, or labour-replacing? It is too early for us to tell definitively. We need to look at the data. One thing we can be pretty sure about is that generative artificial intelligence (AI), of which ChatGPT and large language models are just one example, is going to increase worker productivity in some jobs. The work I cited in those earlier posts gave us a good idea of which jobs were most exposed, and this new NBER Working Paper (ungated version here) by Erik Brynjolfsson (Stanford University), Danielle Li, and Lindsey Raymond (both MIT), tells us about the potential productivity gains.

Specifically, Brynjolfsson et al. used the staggered introduction of a generative AI tool in a contact centre setting to look at its effects on worker productivity. As they explain:

We examine the staggered deployment of a chat assistant using data from 5,000 agents working for a Fortune 500 software firm that provides business process software. The tool we study is built on a recent version of the Generative Pre-trained Transformer (GPT) family of large language models developed by OpenAI... It monitors customer chats and provides agents with real-time suggestions for how to respond. It is designed to augment agents, who remain responsible for the conversation and are free to ignore its suggestions.

Brynjolfsson et al. look first at the effects of the chat on the number of chats that an agent is able to successfully resolve per hour. They find that:

...deployment of AI increases RPH [Resolutions Per Hour] by 0.30 calls or 13.8 percent.

They also find:

...a 3.8 minute decrease in the average duration of customer chats, a 9 percent decline from the baseline mean (shorter handle times are generally considered better)... a 0.37 unit increase in the number of chats that an agent can handle per hour. Relative to a baseline mean of 2.6, this represents a roughly 14 percent increase. Unlike average handle time, chats per hour accounts for the possibility that agents may handle multiple chats simultaneously. The fact that we find a stronger effect on this outcome suggests that AI enables agents to both speed up chats and to multitask more effectively... a small 1.3 percentage point increase in chat resolution rates, significant at the 10 percent level. This effect is economically modest, given a high baseline resolution rate of 82 percent; we interpret this as evidence that improvements in chat handling do not come at the expense of problem solving on average. Finally... no economically significant change in customer satisfaction, as measured by net promoter scores: the coefficient is -0.13 percentage points and the mean is 79.6 percent.

All of that suggests that the contact centre workers are more productive. However, the effects are heterogeneous by skill level and tenure. When they stratify their sample by skill level (as measured in the quarter prior to adoption of the AI system), Brynjolfsson et al. find that:

...the productivity impact of AI assistance is most pronounced for workers in the lowest skill quintile... who see a 35 percent increase in resolutions per hour. In contrast, AI assistance does not lead to any productivity increase for the most skilled workers...

And similarly, when they stratify their sample by tenure:

We see a clear, monotonic pattern in which the least experienced agents see the greatest gains in resolutions per hour.

Taken all together, this means that the AI system narrows the gap between high-quality and low-quality workers, and the gap between workers with more experience and workers with less experience. In fact, Brynjolfsson et al. note that:

 AI helps new agents move more quickly down the experience curve... agents with two months of tenure and access to AI assistance perform as well as or better than agents with more than six months of tenure who do not have access.

When Brynjolfsson et al. look at the actual text of interactions between agents and customers, they find:

...suggestive evidence that AI assistance leads lower-skill agents to communicate more like high-skill agents.

They also find suggestive evidence that customers are happier (as measured by the sentiment in the conversations), and that the improvements in sentiment are greater for workers with lower skill, and workers with less tenure. Finally, we learn that:

...on average, the likelihood that a worker leaves in the current month goes down by 8.6 percentage points... We find the strongest reductions in attrition among newer agents, those with less than 6 months of experience. The magnitude of this coefficient, around 10 percentage points, is large given baseline attrition rates for newer workers of about 25 percent... we find a significant decrease in attrition for all skill groups, but no systematic gradient.

So, the employer has fewer workers leaving (lower attrition), but the effect on employment also depends on how many new workers they hire as well, which we don't know. The last point, about no differences in attrition by skill, is important. That's because, earlier in the paper, Brynjolfsson et al. note that:

Agents are paid an hourly wage and bonuses based on their performance relative to other agents.

If the workers are paid for their performance relative to other agents, and the AI makes lower-quality and shorter-tenured agents perform better, that will tend to increase the wages of lower-quality and shorter-tenured agents, and consequently decrease the wages of higher-quality and longer-tenured agents. So, I was a little surprised that there wasn't more attrition among the higher-skilled workers at least. And that would be a problem looking forward, because it is the interactions between high-quality workers and customers that you want to use to train future AI models. If the higher-skilled workers leave, then there will be lower-quality training data available.

So, what we learn from this paper is that there are relatively large productivity increases among these contact centre workers. And those workers are clearly satisfied with their job changes, as they don't leave their job as often. Those two effects mean more profits for the employer (better quality customer service, and lower costs of replacing workers).

Coming back to the question we started this post with, what does that mean for employment overall? Unfortunately, we don't get a straight answer to that question. Fewer workers are leaving the employer, but we don't know if the employer is offsetting that by hiring fewer new workers. It looks like we're going to have to wait for additional future research in order to better understand the impacts of generative AI on employment.

[HT: Marginal Revolution]

Read more:

*****

[*] Or, the amount of additional production that the employer receives from one additional labour hour. It doesn't really matter which way we define the marginal revenue product. The rest of the explanation works the same. I just find it a bit easier to talk about marginal productivity in terms of whole workers, rather than labour hours.

[**] This is also known as the marginal revenue product of labour.

Sunday 14 May 2023

More on ChatGPT and the labour market

Last month I posted about ChatGPT's impact on the labour market (based on this working paper), and concluded that:

...Zarifhonarvar's analysis is very crude and fairly speculative. We can expect some more thorough analyses, including the first studies using real-world data, to become available before long.

This literature is moving fast. Shortly after the Zarifhonarvar paper was released, this working paper by Tyna Eloundou (OpenAI) and co-authors became available. They answer a similar research question to Zarifhonarvar, identifying the jobs that are more likely to be impacted by large language models (LLMs). Specifically:

We present our results based on an exposure rubric, in which we define exposure as a measure of whether access to a GPT or GPT-powered system would reduce the time required for a human to perform a specific [Detailed Work Activity] or complete a task by at least 50 percent.

Eloundou et al. use data from the O*NET database, which:

...contains information on 1,016 occupations, including their respective Detailed Work Activities (DWAs) and tasks. A DWA is a comprehensive action that is part of completing task, such as "Study scripts to determine project requirements." A task, on the other hand, is an occupation-specific unit of work that may be associated with none, one, or multiple DWAs.

The dataset contains 19,265 tasks and 2.087 DWAs, and they use humans to code each DWA and a subset of tasks to an exposure level (no exposure; direct exposure if a LLM could reduce the time required for a DWA or task by half; or LLM+ exposure if additional software could be developed that would allow a LLM to reduce the time required by half). In a particularly fitting research method, they then use GPT-4 to code all of the DWAs and tasks. The human and GPT-4 ratings are quite similar (for those DWAs and tasks that were coded by both).

Then, Eloundou et al. classify jobs in terms of their exposure to ChatGPT, based on the DWAs and tasks associated with each job. They find that:

...approximately 19% of jobs have at least 50% of their tasks exposed when considering both current model capabilities and anticipated tools built upon them... Accounting for other generative models and complementary technologies, our human estimates indicate that up to 49% of workers could have half or more of their tasks exposed to LLMs.

But which jobs are most exposed? Eloundou et al. note that:

Occupations with higher wages generally present with high exposure, a result contrary to similar evaluations of overall machine learning exposure... When regressing exposure measures on skillsets using O*NET’s skill rubric, we discover that roles heavily reliant on science and critical thinking skills show a negative correlation with exposure, while programming and writing skills are positively associated with LLM exposure...

We analyze exposure by industry and discover that information processing industries... exhibit high exposure, while manufacturing, agriculture, and mining demonstrate lower exposure.

There is nothing too surprising in these results, and they accord well with the earlier work by Zarifhonarvar, albeit using research methods that are more credible. Eloundou et al. conclude that "GPTs are GPTs", meaning that GPTs are a general purpose technology, in that they:

...meet three core criteria: improvement over time, pervasiveness throughout the economy, and the ability to spawn complementary innovations...

The first criterion is clearly met, as will be obvious to anyone who has followed this topic over the last six months. Eloundou et al.'s paper provides evidence for the latter two criteria.

What implications does this have? It would be attractive to infer that workers in high-wage, high-skill (particularly programming and writing skills) are at risk. However, as my students noted in a recent Economics Discussion Group meeting, it is not at all clear yet whether LLMs will be a labour-replacing technology, or a labour-augmenting technology. Will LLMs replace human labour, leading to fewer of the jobs that are most exposed to them? Or will LLMs augment those jobs, making the workers radically more productive and efficient, and opening new job opportunities? In order to understand that, we need to look at real-world cases of job change resulting from LLMs. More on that in my next post (Update: see here].

Read more:

Friday 12 May 2023

Book review: The Economist's Craft

There is a plethora of advice available for economics PhD students and job market candidates available. For example, Chris Blattman has an excellent series of posts of advice for PhD students (not just in economics), John Cochrane provides writing tips for PhD students, Susan Athey has tips for applying to graduate school, and Pam Jakiela provides advice on the academic job market for PhD graduates. Chris Roth and David Schindler provide a bunch of links to similar sources of advice.

Despite the obvious value in all of these sources of advice, they tend to share a couple of things in common. First, with few exceptions they are very much targeted at students in top PhD programmes. Second, they are generally focused on students within the US PhD ecosystem. That makes them of limited value to students who are not in top US PhD programmes, like my students.

So, I was expecting to be disappointing in reading Michael Weisbach's 2021 book The Economist's Craft. However, I couldn't have been more wrong in my expectations. The subtitle of the book is "An introduction to research, publishing, and professional development", and it provides exactly that. Moreover, it does it in such a way that it would provide significant value for PhD students (in economics and similar disciplines) even if they are not in US PhD programmes, and not in the top programmes either. Weisbach is himself a top scholar in financial economics, and much of the advice is targeted at top students, but the underlying ideas are the same for students at all levels.

The book is organised around the PhD journey and beyond, starting with advice on choosing a research topic, writing a draft, presenting research and getting it published, and how to become a successful academic, including managing an academic career. These are the key aspects of what Weisbach refers to as the craft of an academic:

Much of what academics do can best be described as a craft. Like other kinds of craftmanship, scholarly work is a combination of time-tested techniques, strategic thinking, ethics, and imagination. These things can be learned, but they tend to be acquired in a haphazard manner.

The book does an excellent job of exposing the tacit knowledge of academia - the things that academic economists otherwise learn 'on the job', from the PhD through to the end of their academic career. The content is rich in advice, and there is far too much for me to repeat or even summarise here. Indeed, I am sure that there is something valuable in this book for every PhD student in economics, and I truly wish that this book had been around when I was just embarking on my PhD journey. In fact, I may well make an effort to give my future PhD students a copy of this book when they reach confirmed enrolment (after the first three months of their PhD journey, when their full research proposal is complete and has been approved).

If you are a current PhD student, a future PhD student, an early career researcher newly graduated with a PhD, or a seasoned academic who supervises PhD students, then this book is a must-read. Highly recommended!

Wednesday 10 May 2023

Transparency, reproducibility, and the credibility of economics research

Back in 2021, I wrote a post about the past and future of statistical significance, based on three papers published in the Journal of Economic Perspectives (see here). The third paper I referred to in that post was by Edward Miguel (University of California, Berkeley) and focused on open science and research transparency.

Miguel has a much longer 2018 article on the same topic, co-authored with Garret Christensen (also University of California, Berkeley), published in the Journal of Economic Literature (open access). The 2018 article covers much of the same ground, but in more detail. Christensen and Miguel begin by outlining the problems with the empirical literature, which they see as:

...publication bias, specification searching, and an inability to replicate results.

They survey the literature on each issue. First, publication bias occurs where the statistically significant results are more likely to be published than statistically insignificant results. This is well known already, and there is plenty of evidence of this bias across multiple fields, including economics. Christensen and Miguel summarise this literature, but pointedly note that:

Of course, and not to be facetious, one cannot completely rule out publication bias even among this body of publication bias studies.

Indeed. Second, specification searching occurs when researchers selectively report only some of the analyses that they have conducted (specifically, the analyses that lead to statistically significant results), while ignoring any other analyses that were conducted. Since some analyses will lead to statistically significant results simply by chance, specification searching means that we cannot necessarily believe the results of published studies. Christensen and Miguel single out the analysis of sub-groups as a particularly problematic type of specification searching (and one that is a pet hate of mine). The more sub-groups that researchers analyse, the more likely they are to turn up something that is statistically significant.

Third, the replication crisis is well-known in psychology, but economics has been facing a replication crisis of its own. Christensen and Miguel outline many of the challenges that prevent sensible replication in the first place (e.g. data unavailability), and note the different types of replication that are available to researchers. However, the problem continues to be that replications carry very little weight in the research literature. They are barely cited, and few credible researchers are willing to spend their scarce time and resources on replicating the work of others rather than doing their own new research.

Finally, Christensen and Miguel present some potential solutions to ensure greater transparency and reproducibility in economics, as well as improving the credibility of reported results. These include: (1) improved research design and greater use of meta-analyses (with improved tests for publication bias); (2) making appropriate corrections for multiple testing when many outcome variables, or many specifications, are used; (3) adopting pre-analysis plans that specify variables and model specifications in advance (which was a focus of the more recent JEP papers in my previous post); (4) improving disclosure and reporting standards; and (5) more open data and materials, to enable replication of published results. I think that they could have gone a step further by advocating for more open access to published research, as well as open peer review (both pre-publication and post-publication). Open peer review is scary (both for authors and for reviewers), but does enable more effective critique of (as well as support for) published research.

It is difficult to argue with any of the recommendations that Christensen and Miguel make. The sad thing is that, several years on, we don't seem all that much closer to achieving the goal of transparent, reproducible, and credible economic research.

[HT: Jacques Poot]

Read more:

Monday 8 May 2023

Network externalities and electric vehicle subsidies

The government's Clean Car Discount scheme has been in the news this week. As Stuff reported:

The Government is changing the Clean Car programme to increase fees slapped on higher emitting vehicles, changing the rebates for zero emissions imports and lowering the threshold for eligible vehicles.

The changes come as the scheme was “successfully exceeding industry and government projections”, Transport Minister Michael Wood said, after it was reviewed a year into its full implementation.

It is timely to remind ourselves of the positive and negative aspects of a subsidy. A subsidy leads to a deadweight loss (as I outlined in the footnotes to this post from 2021 when the electric vehicle (EV) subsidy scheme was introduced). That is, there is a loss of economic welfare associated with a subsidy. However, as I noted in that same footnote:

...this assumes that there are no positive externalities associated with electric vehicles, which there probably are - a person buying an EV is a person not buying a carbon-powered vehicle, and so each EV sold reduces carbon emissions (and reducing a negative externality is the equivalent of a positive externality).

It is possible that a subsidy on a good that has a positive externality (or that reduces a negative externality) could increase total welfare. However, that is not the only reason that we might favour a subsidy for EVs.

Think for a moment about the infrastructure that exists to support petrol vehicles. In most towns and cities, there are many petrol stations where cars can refuel. It is very easy and inexpensive for a petrol vehicle owner to find somewhere to refuel. Now think about the corresponding situation for electric vehicle owners. Although there are electric vehicle charging stations around (in mall carparks, for example), the infrastructure is nowhere near as available as it is for petrol vehicles. An EV owner faces a more difficult and costly (in time and effort) exercise to charge their vehicle away from home.

Clearly, the fact that there are few places to charge EVs means that there is not as much incentive for firms to provide EV charging as there is to provide petrol refueling. Most of the reason for a lack of EV charging infrastructure is simply a lack of demand. However, there is a bit of a chicken-and-egg problem here. If there are few places to charge EVs, then few consumers will buy EVs. And if few consumers buy EVs, then there is little incentive [*] for firms to provide EV charging stations. In other words, there are positive network externalities across these two goods (EVs, and EV charging stations). The more EVs there are, the more profitable it is for a firm to provide EV charging stations. The more EV charging stations there are, the more value there is for each EV driver, since they can more easily find somewhere to charge their vehicle.

Subsidising electric vehicles may help us to get out of this chicken-and-egg situation. Having more EV owners (because EVs are less expensive as a result of the subsidy) creates incentives for firms to build out the charging infrastructure necessary to support EVs. The subsidy acts as a kickstart for the process of building more EV charging stations, which then makes it easier to own an EV, which incentivises more charging stations, and so on. The process snowballs. So, quite aside from any argument associated with environmental externalities, there is an argument to be made for a subsidy.

However, what is not at all clear is whether this is the right subsidy to achieve the goal of kickstarting an EV-charging snowball. The same outcome could be achieved by subsidising the EV charging stations, rather than the EVs. Subsidising EV charging stations might even be less costly for the government, as it would mean dealing with fewer subsidy recipients (reducing the transaction costs of the subsidy). That is the approach adopted in the US, where Tesla is making its charging network available to owners of EVs from other manufacturers, in order to receive a subsidy from the US government.

The key difference between the two subsidy options is political. Subsidising consumers to buy EVs provides a handout to voters (or at least, to voters who buy EVs). Subsidising firms to build EV charging stations provides a handout to firms. Handouts to voters play out much differently in the media than handouts to firms. It should be little surprise then, that a government might prefer to subsidise EVs rather than EV charging stations, even if subsidising the charging stations would be less costly for the same outcome.

*****

[*] Note that there isn't no incentive for firms to provide EV charging stations. Firms with environmental goals, or firms that want to look like they are supporting green causes, may provide EV charging stations even if there is little demand for them.

Read more: