Friday, 19 June 2026

This week in research #131

Here's what caught my eye in research over the past week:

  • Chan (open access) finds that, between 1870 and 1910, ports which increased their proportion of steam in shipping volumes increased trade by diversifying their trade flows in terms of the range of trading partner countries and products traded
  • Brodeur, Kattan, and Musumeci (with ungated earlier version here) study the relationship between statistical significance and placement outcomes for 200 empirical economics job market papers from 2018-2021, finding that marginally significant results are associated with higher academic placement likelihoods, providing a strong incentive for young researchers to 'p-hack' for statistical significance
  • Gershoni and Stryjan (with ungated earlier version here) find significant declines in both exam attendance and demonstrated knowledge following the switch to online instruction during the COVID-19 pandemic in Israel

Tuesday, 16 June 2026

My take on that iPhone-fertility paper

If you've been reading the news over the last week, you may have seen talk about new research linking fertility decline in the US to the release of the iPhone. For example, the New Zealand Herald reported that:

Middlebury College economist Caitlin Myers and her student Ezekiel Hooper tested a hypothesis that smartphones - which emerged with the arrival of the first iPhone in 2007 - might have something to do with it.

Until 2011, iPhones were available from a single US cellular network, AT&T, so they compared US counties that had near-universal AT&T coverage with those that had little or none during those years.

And they found that access to the iPhone correlated with reductions in births by 4.5% to 8% at ages between 15 and 19, and by 3.2% to 6.6% at ages between 20 and 24.

There were also statistically significant but smaller declines among older women.

Other news sources picked up that the research attributed 33 to 52 percent of the decline in fertility to the iPhone's release (see here and here, for example). That result made me sceptical, and my concerns really echo those of Tyler Cowen here:

In 2008, 1.9% is the share of the mobile-subscribing population with an iPhone wireless subscription.  As a percent of all adults that is 1.6%.

In 2009, it is 4.3%.  3.6% of all adults.

In 2010, 6.8%.  5.5% of all adults...

So when the authors talk about diffusion explaining 33–52% of the decline in the general fertility rate among American women 15–44, I still do not get how that is supposed to operate.

If less than six percent of all adults have an iPhone by 2010, how could iPhones reduce fertility by between one-third and half? This requires very large spillovers from a small group of early adopters, and I am not convinced the paper has made those spillovers quantitatively plausible (we'll get to the authors' views on that later).

The research is reported in this NBER Working Paper by Caitlin Myers and Ezekiel Hooper (both Middlebury College). They use data on national wireless broadband coverage at the census block level to categorise US counties into those where less than 10 percent of the population have coverage by AT&T ('control' counties) and those where more than 90 percent of the population have coverage by AT&T ('treated' counties). Their sample includes 1399 'control' counties, and 914 'treated' counties (with 794 counties excluded from the sample). The reason that Myers and Hooper chose AT&T is because AT&T had an exclusive arrangement with Apple for almost the first four years after it was first launched in June 2007. The first Android phones didn't become available until October 2008, and didn't become widespread in the 'control' counties until a year later. So, there was a period where AT&T coverage is a reasonable proxy for the prevalence of iPhones.

Myers and Hooper then compare control counties with treated counties in terms of annual age-specific fertility rates (in five-year age groups). However, they recognise a key problem, which is that the treated and control counties differ in meaningful ways, the most obvious of which is that the treated counties are more urban than the control counties. This is a problem for their analysis because fertility rates have been declining more rapidly in urban areas than in rural areas, and therefore this would lead to overstatement of the measured effect of iPhone coverage on fertility. Specifically, the CDC reports that from 2007 to 2017, the total fertility rate fell by 12 percent in rural counties (many of which will be in the control sample), but by 18 percent in large metro counties (which are almost certainly in the treated sample).

Myers and Hooper try to deal with this problem by re-weighting their data in two ways. The first is by using an "entropy balanced Poisson event study", which effectively re-weights the control counties by giving more weight to those that are most similar to the treated counties in terms of their cross-sectional characteristics at the time of the iPhone launch. The second is by using a "synthetic difference-in-differences estimator", which creates a set of synthetic control counties by re-weighting the control counties so that the time series of fertility most closely matches each of the treated counties.

Using those methods, Myers and Hooper find the results that the news media has picked up. Specifically:

Both estimators imply large, statistically significant declines in births to young women. The post-gestation ATT ranges from −4.5 to −8.0% at ages 15–19 and −3.2 to −6.6% at ages 20–24 (the entropy-balanced Poisson at the lower-magnitude end, SDID at the higher), with smaller effects at older ages. Scaled to the U.S. county universe, these estimates imply the iPhone accounts for between 33 and 52% of the 2007–2011 decline in the general fertility rate. The pattern is similar across race, parity, marital status, and education, with the exception of Black women, for whom we estimate no effect.

The key results are summarised in Figure 3 from the paper (for the entropy balanced Poisson event study):

And in Figure 4 from the paper (for the synthetic difference-in-differences (SDID) estimator):

In both cases, the point estimates from the time before 2008 show no statistically significant difference between treated and control counties, while there is a negative (and increasing) difference between treated and control counties from 2008 onwards. However, notice that in Figure 3 (the first figure above), it seems clear visually that the downward trend starts before 2008, even if it is statistically insignificant. In Figure 4, there is no pre-trend, but remember that in the SDID analysis, the controls are reweighted to replicate the pre-treatment time series of fertility for the treated counties, so there should be no difference in the pre-treatment values by construction.

Myers and Hooper run various robustness checks that address some of the more obvious criticisms of their approach, including sensitivity to the choice of treatment and control cutoffs, using a continuous treatment variable, estimating the model in levels rather than logs, various placebo treatments, and truncating the sample to exclude any contamination from the release of Android phones. Among the placebo tests, they run analyses using Verizon's and Sprint’s pre-2011 coverage, and find no effects. So, their findings are not general to the difference between counties that attract mobile operators and those that don't. They also address the plausibility of the results, noting that:

The iPhone is not a treatment that operates at the individual level. Whether one’s own phone matters likely depends on whether one’s peers have phones; a phone in a friend group full of non-owners is a different intervention than a phone in a group where everyone has one. Spillovers run between phone-owning peers and their non-owning friends, and operate at the level of the group, not just the match: if smartphones reduce friend-group meetups and parties, then matches that would have formed under no-iPhone simply never do—the unformed match is itself the outcome.

That may be so, but the implied size of the spillovers is far larger than is plausible. If, as Cowen suggests, less than 15 percent of the population have iPhones, unless iPhone ownership and the spillovers from iPhone ownership were heavily concentrated among women of childbearing age, the overall effect simply can't be that large.

So, what has gone wrong. The overall approach that Myers and Hooper apply seems valid on the face of it, and re-weighting of controls to better match the treated sample is a common method of causal inference. The problem here is that the weighting is extreme. Myers and Hooper note that, in relation to the entropy balanced Poisson event study approach:

Balance comes at a cost: equalizing the marginal means requires putting high weight on a small number of treated-like controls. The Kish (1965) effective sample size of the balanced control pool is 77 out of 1,399 raw controls...

So, basically the analysis is heavily skewed towards a comparison between the treated counties and a small number of control counties, which are the control counties that are most like the treated counties (which also makes them the most unlike the other control counties). Those control counties are doing a lot of the work in this analysis.

There are also other possible differences between urban and rural counties that are approximately contemporaneous with the release of the iPhone. First among these is the 'Great Recession' and the housing slump around that time. Myers and Hooper do control for county-level changes in house prices, so that reduces concerns about contamination from that source. They also control for unemployment and poverty rates, which might pick up differential changes in labour markets. However, there was a change in contraceptive availability that directly affects young women's fertility, which is expanded access to the 'morning after pill' for 17-year-olds, although that occurred in 2009. Finally, after the 'Great Recession' there was a slowdown in Hispanic immigration, which might have affected urban and rural counties differently. Given that Hispanic immigrants tend to have relatively higher fertility than the US-born, so if the decline in Hispanic immigration was greater in control counties (and especially for the small number of heavily weighted control counties), then that might explain the effect. Myers and Hooper control for county Hispanic population share. However, it would be better to control for Hispanic population share among the age group that is being analysed, or to control for changes in Hispanic immigration.

This paper has certainly gotten people talking. Smartphones might be part of the story of why fertility has declined, but I don't think that we should uncritically take away from this study that the iPhone caused half of the decrease in US fertility between 2007 and 2011. More likely, it had a modest effect (if at all), and is confounded by a number of other changes that differentially impacted rural and urban US counties at around the same time.

[HT: Marginal Revolution]

Read more:

Sunday, 14 June 2026

Book review: How to Think Like an Economist (Roger Arnold)

If you ask many economics teachers, they will tell you that they really want to teach students how to think like an economist. However, in amongst the supply and demand curves, the elasticities, and the multiplier effects, the core goal of teaching students to actually think like an economist gets lost, overwhelmed by a lot of do this stuff like an economist. So, it's interesting when a book actually tries to get behind the models and teach the underlying thinking.

That's what the 2005 book How to Think Like an Economist, by Roger Arnold, tries to do. Arnold explains that:

To teach students how economists think, we must tell them stories. While we tell the stories, we must point out just what is "running through the economist's head." In this book, I have tried to focus on what goes through the economist's head as he or she looks at the world.

And mostly, Arnold is successful, although it isn't always the case that every economist would think in the same way. For example, Arnold makes a big deal about ratios. And while ratios are important, I for one am never thinking about the ratio of marginal benefit to marginal cost, when I can simply think about which one is larger. The ratio is redundant.

There is a lot to like about this book, and Arnold surfaces some of the more surprising (to non-economists) ways that economists would think about problems. For example, who but an economist would even ask the question, "What is the optimum amount of hitting yourself in the head with a hammer?". And yet, Arnold treats us to a consideration of exactly that question in the second chapter.

Having said that, I felt like the book was quite uneven. Although Arnold warns readers at the beginning that the book is intended as a companion to a more thorough textbook economics treatment, and gives examples of how the chapters can be mixed and matches with various styles of economics courses, a reader reading the book chapter by chapter is constantly confronted with terminology that is left unexplained until later chapters. This was most jarring in the case of the 'equilibrium price', which came with no explanation of what equilibrium is, nor why the equilibrium price is important at all. Similarly, Arnold uses the term ceteris paribus first, without explaining what it means. And if you want to understand how the economist thinks, understanding the meaning of ceteris paribus (which, for the record, means holding all else constant) is kind of important.

Arnold also betrays a lack of understanding of some real-world context. Blackjack is provided as an example of a zero-sum game played between the players. However, blackjack in the real world is not at all like that. Blackjack players are playing against the house, not against each other. One blackjack players win does not in itself entail a loss to the other players.

So, although understanding how economists think is important, and I applaud the effort and the approach that this book takes, I feel like it fell a bit short of the mark. This book is long out of print, but that might not be such a bad thing.

Friday, 12 June 2026

This week in research #130

Here's what caught my eye in research over the past week:

  • Fumarco and Groero (open access) describe a Stata package that reduces a dataset down to just those variables that are used in a particular .do file (useful for creating replication packages while minimising data bloat)
  • Cox (open access) describes three Stata commands that creates a new dataset of the quantiles, percentiles, or confidence intervals for a particular variable or result (if you've ever needed to do this, you will know how frustrating it is)
  • Yarashov, Baryshnikova, and Kakhkharov find that military expansion exerts a significant negative impact on fertility across 15 post-Soviet countries between 1992 and 2022
  • Chatterjee, Dimova, and Ojha (open access) find, using a correspondence study in urban India, that equally qualified single mothers are much less likely to receive interview callbacks than unmarried women without children, married women, and married mothers
  • Charness et al. (with ungated earlier version here) provide a convincing argument of the virtues of lab experiments in economics
  • In a companion piece, Gneezy examines the principles of experimental economics
  • Wang finds that China's policy to limited young peoples’ access to online video games did not produce detectable effects on academic performance, study time, or health
  • Pritchett and Viarengo (open access) demonstrate that ad hoc poverty lines, including the World Bank's poverty lines, are far too low to be plausible candidates for an inclusive global poverty line

Wednesday, 10 June 2026

Is it working from home, and not generative AI, that is harming the prospects of young workers?

There is growing evidence that the labour market for young workers is challenging. Graduates are finding it more difficult to get jobs after graduation. Several research papers have noted that generative AI may be to blame (see this post, for example), with one research paper referring to the changes in the labour market as seniority-biased technological change (see this post).

But the challenge with trying to attribute changes in the labour market to the rise of generative AI is that there are other contemporaneous changes affecting the labour market as well. One of those changes is the rise of working from home (as I noted in yesterday's post). Working from home may reduce the prospects for junior workers in part because it costs more to supervise and monitor them when they are working from home. Junior workers also benefit from on-the-job learning when they work with other people, and that on-the-job learning is less effective when they work from home. Combining those two effects, working from home reduces the incentive for employers to hire junior workers.

This new working paper by Peter Lambert (University of Warwick) and Yannick Schindler (Ellison Institute of Technology, Oxford) tries to disentangle the effects of generative AI and working from home on employment of younger workers. They use data from Revelio Labs that is made up of monthly matched employer-employee records collected from résumés (predominantly from LinkedIn) to construct a measure of the junior share of all new hires. They also use data from Lightcast on the near-universe of online job postings across thousands of online job sites and other websites. They use the Lightcast data to construct a measure of the share of job postings that require three or fewer years of experience. Their data from both sources covers the period from 2017 to 2025, and includes four countries: the US, the UK, Canada, and Australia.

Lambert and Schindler then use that data, along with measures of 'exposure to generative AI' and 'exposure to working from home' at the occupation level, in a difference-in-differences strategy. That means that they essentially compare the change in the share of junior job hires (or job postings) between occupations that are more or less exposed to generative AI (or working from home). Their main results are neatly summarised in Figure 3 from the paper:

Panel (a) shows that the junior share of new hires decreases significantly in jobs that are more exposed to working from home, from 2023 onwards (the black line). When they also control for exposure to generative AI (the red line), the effect of working from home barely changes. In contrast, Panel (b) shows that the junior share of new hires also decreases significantly in jobs that are more exposed to generative AI, from 2023 onwards (the black line). However, when they also control for exposure to working from home (the blue line), the effect of generative AI becomes much smaller and statistically insignificant. The results are similar for the share of job postings requiring three or fewer years' experience, as shown in Panels (c) and (d) of the figure.

The size of the effects are quite large too. A one-standard-deviation increase in exposure to working from home reduces the junior share of new hires by about two percentage points, and the share of job postings requiring three or fewer years' experience by 1.5 percentage points.

Lambert and Schindler conclude that, based on their results, working from home is a better predictor of the decline in junior hiring than generative AI. Given potential benefits of working from home, they are reluctant to recommend policies against working from home, instead noting that:

...micro-level adjustments may be required to help firms adapt their organizational practices, so as to enjoy the benefits of WFH [work from home] arrangements while simultaneously managing the development of early-career talent.

Seen alongside the negative mental health impacts of working from home (as noted in yesterday's post), this should give us further pause for thought. However, it is worth noting that even if working from home is a better predictor of reductions in junior hiring than generative AI within their model, that doesn't let generative AI off the hook entirely. Since both trends are happening at the same time, reducing working from home might not eliminate the negative impacts on junior hiring, but instead make generative AI appear more important as an explanation. Lambert and Schindler note early in their paper that it is often the same occupations (white-collar occupations) that are most exposed to both working from home and generative AI. Given that, perhaps Lambert and Schindler's recommendation for micro-level changes in organisational practice may be the best mitigation strategy available to us.

[HT: Marginal Revolution]

Read more:

Tuesday, 9 June 2026

Two new studies on who works from home, and its mental health impacts

The pandemic caused a massive rise in working from home and now, even though lockdowns are long since over and many workers have returned to the workplace, we are beginning to understand working from home (WFH) a lot better. Two new studies have recently added to our understanding.

The first is this article by Cevat Giray Aksoy (European Bank for Reconstruction and Development) and co-authors, published in the AEA Papers and Proceedings (ungated earlier version here). They use data from the monthly US Survey of Working Arrangements and Attitudes, limiting their data to the period from January 2024 to December 2025, and document three facts about WFH. First, employees are more likely to work from home if they work for a younger firm, and peaks among those working for employers that were founded in the height of the pandemic, in 2020.

Second, employees are more likely to work from home if they work at a firm with a younger CEO. Specifically:

Firms led by CEOs under 30 have an average of 1.4 WFH days per week, compared with 1.1 days at firms led by CEOs who are 60 or older.

That doesn't seem like a lot, but an additional 0.3 days per week is a little more than three working weeks per year of WFH for those working for the youngest CEOs compared with those working for the oldest. However, this relationship between CEO age and WFH appears to be partly explained by the fact that younger CEOs are more likely to be leading younger firms. When Aksoy et al. put both CEO age and firm age in the same regression model, only firm age remains statistically significant. It is a similar story for CEO gender, which is initially statistically significant, but since female CEOs tend to be younger and to be CEOs of younger firms, CEO gender isn't statistically significant once those other variables are controlled for.

Third, the self-employed are much more likely to work from home. Specifically:

Self-employed workers report two to three times as many WFH days per week as wage and salary employees, depending on employer size. Compared to wage and salary employees, the self-employed are more than three times as likely to work in a fully remote capacity.

This last result is not entirely surprising, given that the self-employed typically have a lot more flexibility over scheduling. And, the self-employed may be the type of people who most value flexibility as well.

The second new article is this one by Natalia Emanuel (Federal Reserve Bank of New York), Emma Harrington (University of Virginia), and Amanda Pallais (Harvard University), published in the prestigious journal Science (open access). They look at the mental health impacts of WFH, using US data from a variety of sources, and a difference-in-differences approach. This involves comparing occupations that are more or less amenable to WFH, between the time before the pandemic and the time after the pandemic. They refer to the occupations that are more amenable to WFH as 'remotable'.

Emanuel et al. first document the dramatic rise of WFH:

The pandemic led to a large increase in remote work for those in remotable jobs, such that by 2024, workers in remotable jobs spent 31.1% of workdays fully remote, whereas people in nonremotable jobs spent only 8.9% fully remote... Those in remotable jobs experienced a 17.9 percentage point (pp) differential increase in fully remote work...

They then show that this rise is associated with more time spent alone:

Along with spending less time in the office, workers in remotable jobs spent more time working alone after the pandemic, logging 1.2 more work hours alone per day relative to nonremotable workers (58.0% increase; P < 0.0001).

Even for those of us who are introverts, more alone time may not necessarily be a good thing. Emanuel et al. are concerned about how WFH and working alone affects mental health. Their main outcome variable is the Kessler (K-6) Psychological Distress Scale, which is:

...based on how often in the past 30 days the respondent felt worthless, hopeless, restless, nervous, that everything is an effort, or so sad that nothing could cheer them up...

Their main source of data is the Panel Study of Income Dynamics covering the period from 2011 to 2023 (from which they exclude the pandemic years 2020 and 2021). Analysing that data, they find that:

Between the pre-and postpandemic periods, mental distress increased for everyone, but it increased significantly more for those in remotable jobs...

Among those in remotable jobs, there was a 0.3 unit increase in the K-6 distress score relative to an average score of 3.0 before the pandemic (standard deviation change = 0.08; P = 0.063) in the Panel Study of Income Dynamics (PSID). In the National Health Interview Study (NHIS), we found the same 0.3 unit deterioration (P = 0.007). We saw deterioration in each of the six subcomponents of the K-6 distress scale: feeling worthless, hopeless, restless, nervous, that everything is an effort, and so sad that nothing can cheer them up...

Importantly, the deterioration in mental health is concentrated among people living alone, which is consistent with the idea that WFH affects mental health through increasing social isolation. Emanuel et al. also find that people in remotable jobs are more likely to seek help from a mental health practitioner, and take relatively more prescription medications for mental health conditions such as anxiety or depression. These changes aren't simply the result of greater flexibility allowing more time to be devoted to health care generally, as there was no change in visits to the doctor and no change for other prescription medications such as statins.

Finally, Emanuel et al. looked at whether the rise of generative AI, rather than the increase in WFH, might explain the results (an important check, given the paper I will blog about tomorrow). They find that results from the same analysis, but substituting an AI occupational exposure index in place of the 'remotability' index, are not statistically significant.

Now, many workers are very keen on WFH - as noted in this post, about half of Australian workers would be willing to give up some salary in order to work from home. Why would people choose more WFH if it may worsen their mental health? Of course, a rational worker would weigh up the benefits and costs of WFH, and may decide that the mental health costs are more than offset by other benefits. However, Emanuel et al. point to another related possibility, which is:

...that the benefits of remote work (e.g., skipping a daily commute) are immediate and salient, whereas the costs of remote work (e.g., frayed connections with co-workers) take time to materialize.

So, a rational worker may be essentially weighing up benefits that occur today, against uncertain costs that may occur sometime in the future and therefore should be discounted (in the same way that we should discount future cashflows in a financial analysis). In that sort of exercise, where the mental health costs are discounted, it is more likely that workers would choose to work from home. They would be even more likely to do so if they are quasi-rational and heavily discount the future, as I note in the first week of my ECONS102 class. In that case, the mental health costs would be heavily discounted. Finally, maybe workers are simply unaware of the mental health costs of WFH. If that is the case, then an information intervention might be helpful in improving mental health among workers who would otherwise be WFH. In the meantime, this research suggests that the post-pandemic rise in WFH may have contributed to some part of the growing mental health crisis, especially through increased time spent alone.

[HT: Marginal Revolution for the Emanuel et al. article]

Read more:

Monday, 8 June 2026

Maybe hosting the Olympics just shuffles income around a country, rather than increasing it

There is a large, and still growing, literature on the economic impact of large sporting events (see this post, and the links at the end of it, for some examples). My conclusion from that body of research is that large sporting events are expected to generate large economic impacts (based on studies conducted before the event), but generally the actual economic effects are small or non-existent (when measured after the event). However, the studies are typically based on a single event, or a small number of events. Are the typical null results driven by a small sample size and if so, would a larger and more diverse sample demonstrate different results?

That is the question essentially underlying this 2021 article by Matthias Firgo (Austrian Institute of Economic Research), published in the journal Regional Science and Urban Economics (ungated earlier version here). Firgo looks at the effect of the Olympic Games (both summer and winter) on regional GDP per capita in the host region (not GDP per capita in the whole host country, or only in the host city), using data from the 1992 Winter Olympics in Albertville to the 2020 Summer Olympics in Tokyo. Importantly, Firgo uses a control group made up of regions with cities that had been shortlisted by the International Olympic Committee (IOC) to host in the same year, but were unsuccessful (more on that a bit later).

Because of data limitations, Firgo focuses on GDP per capita as a percentage of national GDP per capita - essentially a relative measure of wellbeing at the regional level. Using this measure, he finds that for the Summer Olympics:

...regional per capita GDP significantly increases by 3.6 %-points (3.3 %-points) relative to national per capita GDP in the year of the event (the year before the event).

In other words, the host region’s GDP per capita rises by around 3 to 4 percentage points relative to national GDP per capita in the lead-up to the event. In contrast, there is only very weak evidence of any persistent effect of the event on regional GDP per capita, and the Winter Olympics (which are a smaller event, and typically held in smaller cities) had no significant effects. The positive effect of the Summer Olympics on regional GDP per capita in the years immediately before the event is consistent with increasing spending on infrastructure (including sporting, transport, hospitality, and cultural infrastructure) in the lead-up to a substantial event. That there is no persistent effect is fairly consistent with the other research on the economic impact of large events.

However, there are two other things to take away from this research. First, if anything these results might overstate the impact of successfully bidding for the Olympics. Whether a potential host city's bid is successful or not is not a random event. Cities that are more likely to be successful hosts should, at least in theory, be more likely to be selected as hosts. So, the control group is an imperfect comparator for the treatment cities in a way that is likely to bias the results. If successful hosts were cities that the IOC believed were already on an upward trajectory at the time of the Olympics, then that would bias upwards the estimated impact of the event. Of course, such foresight from the IOC would have to be executed seven years before the event (which is when the hosts are typically selected), but nevertheless there is potential for upward bias. That said, shortlisted cities are still likely to be a better comparison group than all non-host cities, since they had already demonstrated some capacity and willingness to host.

Second, these results tell us more about relative effects within the host country, rather than absolute economic impacts. They show that the GDP per capita increases in the host region relative to the rest of the country. Given that the overall economic impact is small to negligible, as are population changes arising around the event (both of which many other studies have shown), a large part of the relative increase in GDP per capita in the host region must arise from a combination of increased GDP per capita in the host region, and decreased GDP per capita in other regions in the same country. Effectively then, hosting the Summer Olympic Games simply shuffles income around a country in the lead-up to the games, with the host region benefitting while other regions are negatively impacted. Then after the event, there is a return to the normal inter-regional distribution of incomes.

The Olympic Games is a large spectacle - an opportunity for national celebration as we watch sporting heroes compete to win medals. The evidence still suggests that the Games are not a source of sustained economic growth, and that any short-run gains may be highly localised rather than national, and some of those gains come at the expense of other regions.

Read more:

Saturday, 6 June 2026

Book review: The Nvidia Way

The biggest news story about stock markets over the last three years has probably been the dramatic rise of technology stocks, and particularly those related to AI. And among those stocks, one of the standout performers has been computer chip maker Nvidia. The success of Nvidia now hides the fact that the company had many close calls, where it was literally on the verge of closing down. That is one of the key facts that I learned from reading The Nvidia Way, by Tae Kim.

Kim was previously a technology columnist at Bloomberg, and he tells us he wrote several comments critical of Nvidia. Nevertheless, Nvidia allowed him to have unprecedented access to Nvidia staff, but more importantly, to CEO Jensen Huang. And that is important, because the story of Nvidia, and 'the Nvidia way' is undeniably a story of Jensen Huang. Huang wasn't the only founder of Nvidia, but he has been the face of the company, the driving force behind its successes, and the person most responsible for picking up the pieces after its frequent failures. Kim writes that:

In all my years covering business, as a consultant, an analyst, and now as a business writer, I have never met anyone quite like Jensen. In the field of graphics, he is a pioneer. In the harsh technology market, he is a survivor. And he has been a CEO for more than thirty years - marking him, as of this writing, the fourth-longest currently-serving CEO in the S&P 500...

Kim clearly has a lot of respect for Huang, and this shines through the whole book. Even where other authors would press on the more negative aspects of Huang's personality, such as his ultra-competitive nature, Kim is more measured:

Jensen was so competitive that he challenged other employees even when he was at a disadvantage. In high school, CFO Geoff Ribar had ranked among the top fifty chess players in the country. His boss, however, would not accept that someone else was better than him...

Jensen attempted to close the gap between his and Ribar's chess skills through brute-force learning. He memorized chess openings and sequences of moves, so that he would control the board. Yet Ribar round his playing style predicable... Every time he lost, Jensen would swipe his arm across the board, knocking over the pieces, and storm away. He would sometimes later insist on a rematch on the ping-pong table. Ribar graciously accepted, knowing Jensen was purposely shifting the competition onto more favorable territory.

It is worth noting that Huang was a champion table tennis player. His competitiveness has clearly served him well in business, and is one of the key factors in Nvidia's success.

So, what is 'the Nvidia way', after which the book is titled? Kim notes that it has several characteristics, including the hiring raw talent especially through aggressive hiring methods, its emphasis on retaining high-quality employees, its strong focus on a culture of excellence, the high demands it in turn places on those employees, and the leadership of Huang himself. Not all of these characteristics, especially not Huang, could necessarily be replicated at other companies. However, there is a lot that budding leaders could nevertheless learn from this book.

Having said that, there is one element where the book could have explored deeper. There were many occasions where Nvidia was close to failure, including following the release of one of its very first chips. Obviously, Nvidia is wildly successful as a company now. But should we interpret the company's success in spite of its challenges as the result of good management, culture, and hard work, or should it be interpreted as luck? In other words, how much of Nvidia's observed success is simply survivor bias? Kim obviously sides with attributing the company's success to its own good efforts, but it would have been good for him to turn a more critical eye to just how lucky they had been at key points.

Despite that gripe, I really enjoyed this book. I distinctly remember buying an Nvidia GEForce graphics card many years ago. Kim does a great job of bringing to life all of the characters and their contributions to the story, as well as the key events in the life of the company. If you're interested in understanding the rise of Nvidia, this book is recommended.

Friday, 5 June 2026

This week in research #129

Here's what caught my eye in research over the past week:

  • Araya et al. (with ungated earlier version here, but in Spanish) evaluate the impact of using the CORE textbook (which I use in my ECONS101 class) in introductory microeconomics in Uruguay, in comparison with a conventional textbook, finding no systematic differences in pass or dropout rates between the two courses, but that students using CORE are significantly more likely to believe that it contributed to their academic and professional development
  • Baker et al. (with ungated earlier version here) study the staggered rollout of unionisation across Canadian universities between 1970 and 2022, and find that unionisation compressed salaries, with wages at the bottom of the unconditional distribution increasing by roughly 10 percent, while wages at the top were unaffected
  • Baker et al. (but a different Baker, and with ungated earlier version here) provide a detailed summary of different types of difference-in-differences (DiD) research designs and their associated estimators, as well as discussing covariates, weights, handling multiple periods, and staggered treatment (this will be a highly cited resource, given the number of studies that use DiD for causal inference)

Wednesday, 3 June 2026

This research doesn’t convincingly show that biodiversity is good for business

I was interested to read this article in The Conversation last month by Paul Griffin (University of California, Davis) and Martien Lubberink (Victoria University of Wellington), mainly because of statements like this:

...firms operating in areas with richer biodiversity are measurably more productive.

I thought, that's interesting. This might be a good example to use in class next trimester to illustrate the difference between correlation and causation. After all, the authors may be correct that firms operating in areas with richer biodiversity are more productive (correlation), but that doesn't mean that biodiversity increases productivity (causation).

And then I read the paper that The Conversation article was based on. And at that point, I decided that I shouldn't use this as an example of the difference between correlation and causation, because even the correlations that they find are shaky at best.

The approach that Griffin and Lubberink take is to look at the relationship between measures of business output and measures of biodiversity. Their measure of business output is sales or gross profit, taken from Stats NZ's Longitudinal Business Database. They generally interpret this as a measure of productivity. And that is the first problem with the paper. Sales can be interpreted as gross revenue, and in some contexts sales may be used as a rough measure of gross output. But sales are not a good measure of productivity, and is not a good measure of the economic value created by a business. The more appropriate measure would be value added, or at least something closer to profit. To see why, consider two firms that both produce a product that sells for $1,000 per unit, and both firms sell 1,000 units per month. Both firms have sales of $1 million per month. Firm A buys the product wholesale at a cost of $800 per unit, then adds a mark-up. The value added of Firm A is $200,000 per month. Firm B buys raw materials of $200 per unit, adds labour of $300 per unit, and then sells the product. The value added of Firm B is $500,000 per month. Firm B creates a lot more economic value than Firm A, and yet measured by sales they are the same. Sales are therefore a poor measure of productivity. Gross profit is less problematic, because it subtracts at least some intermediate input costs, but even gross profit is not a pure measure of value added or productivity.

As a measure of biodiversity, or more accurately as a set of proxies for biodiversity-related conditions and pressures, Griffin and Lubberink use a variety of indicators that they call 'biodiversity abundance markers' (which for some reason they use the acronym BDAs to represent). They aggregate data from a range of sources for their various BDAs (which I will discuss in further detail below), with the data at the SA2 level (SA2s are geographical areas approximately the size of suburbs in urban areas, and larger in rural or remote areas). They note that:

For each SA2, we define a vector of “biodiversity abundance markers” (BDAs), where each ranges from 0 to 100. We denote these ranks as BDA1, BDA2, … , BDAm. We then assign them to an SA2 and, therefore, to the businesses and employees in the same SA2. For a given BDA in an SA2, BDAm = 0 means complete biodiversity loss (high pressure from biodiversity loss) for marker mBDAm = 100 (low pressure from biodiversity loss) is equivalent to an SA2 with an undisturbed or fully intact natural state.

So far, so good. The only issue with that approach is that the measures of biodiversity don't have a natural interpretation, because they are just an index. But we often work with indices - you just need to be cautious about how you interpret the magnitude of the effects. Griffin and Lubberink start by showing the correlation between each of their BDA measures and their measures of business output.

However, then they want to create an overall index of biodiversity, and to do this they:

...multiply each BDA by its SA2 land area and denote the result as an empirical proxy for the natural capital (n) of an SA2 applicable to the businesses operating therein.

Remember that the BDA is an index, bounded between 0 and 100, and it has no natural interpretation in terms of magnitude. So, multiplying the index by the land area of the SA2 is not meaningful, because the BDA is not a measured biodiversity stock per square kilometre. I guess it might make sense if you wanted to calculate a weighted average index, where the weights are based on SA2 land areas, but that isn't what Griffin and Lubberink are doing. Their approach is problematic because it mechanically causes the measured biodiversity to be higher in rural areas ceteris paribus (holding all else equal), where SA2s are larger, and lower in urban areas, where SA2s are smaller. Within urban areas, ceteris paribus it causes higher measured biodiversity in industrial and commercial areas, where SA2s are larger, and lower in residential areas, where SA2s are smaller.

Griffin and Lubberink then aggregate their index-multiplied-by-land-area measures in various ways. The aggregation approach they adopt is fine, but when you aggregate numbers that are not individually meaningful, the result is not meaningful either.

But let's take a step back, because there is another problem. Griffin and Lubberink pitch their analysis as based on a Cobb-Douglas production function. That is fine - a Cobb-Douglas function is a way of relating inputs to output. We already know that their measure of output is faulty. Their inputs are also faulty. Their three-factor Cobb-Douglas function includes inputs of financial capital, human capital, and natural capital.

Griffin and Lubberink measure human capital as the number of employees working in business units in an SA2. That is really a measure of labour input, not human capital. To measure human capital (as well as labour), it would be better to also consider the education level of those employees, since more educated (not to mention more experienced) employees have more human capital. So, their measure is unlikely to pick up the important variation in human capital across SA2s, but it will pick up differences in labour input. But as a measure of combined labour and human capital, their measure will bias downwards measured human capital in urban areas, where education levels are highest, and bias upwards measured human capital in rural and remote areas, where education levels are lowest.

Griffin and Lubberink measure financial capital by the number of business units operating in an SA2. That is not financial capital. That is business density. The relationship between the number of firms and financial capital is not straightforward. An SA2 might have lots of small firms that have low aggregate financial capital, or one large firm that has a lot of financial capital.

Finally, we come back to natural capital, which is measured as noted above. However, some of the measures of biodiversity that Griffin and Lubberink use are better suited than others as a measure of natural capital. The definition of capital is important here - capital is stored up resources that can be used to produce things. Financial capital is stored up savings that can be used in the future. Human capital is stored up education and experience that can be used in the future. So, capital is a stock. It is not a flow.

Now, let's consider the BDA measures one-by one. The first (BDA1 - Land Use) is "1 - the ratio of the number of agriculture and forestry business (primary industry) units in an SA2 to the total number of business units in an SA2". This is not really a measure of land use, because it isn't measured in terms of land. The relative size of the businesses is not taken into account, so many small farms would increase this measure compared to fewer large farms. It is also difficult to see how this is a measure of biodiversity.

The second measure (BDA2 - Infrastructure) is "1 - the rank of the number of business units in an SA2 to the land area in km2 of an SA2 divided by the total number of SA2 observations". It is difficult to understand why this BDA is measured as a rank, whereas BDA1 was not. It is also difficult to see how the number of firms is a measure of infrastructure, or how it relates to biodiversity. This measure will tend to be lower in urban areas, where many small businesses are clustered, than in rural areas. So, this is likely just a measure of urbanicity, not a measure of infrastructure or biodiversity.

The third measure (BDA3 - Mining) is "1 - ratio of the number of mining business units in an SA2 to the total number of business units in an SA2", Like BDA1, this doesn't account for the size of the mines. If you have a small quarry, that counts the same in this measure as the enormous Martha Mine in Waihi. It is more plausibly a measure of (negative) biodiversity than the other measures though. Or at least it would be, if the size of the businesses were taken into account.

The fourth measure is climate change in two forms (BDA4a - Climate Change, and BDA4b Heat Spell Anomaly), which are measured as "the sum of the presence of a heat spell, cold spell, rain spell, or wind spell in an SA2 divided by 4" and "the rank of the heat spell anomalies in an SA2 divided by the total number of SA2 observations". They measure heat spells, cold spells, rain spells, and wind spells as the number of days on which the measured variable (temperature, rain, or wind) falls above (or below, for cold spells) the 'rolling mean 95th percentile' (it isn't clear what the term 'rolling mean 95th percentile' actually means). It isn't clear why adding those four up makes any sense, but perhaps you could just label them weather anomalies. In the second form of this measure, like BDA2 it isn't clear why the rank is used when the actual number of heat spells could be used instead. Again, this isn't really a direct measure of biodiversity, but to the extent that weather anomalies impede biodiversity, it may be a reasonable proxy.

The fifth measure (BDA5 - River Diversity) is "River condition × 100, where River condition = Percentage of insect and related species in an SA-located river compared to all possible species". This is probably the clearest actual biodiversity measure in the paper. However, it is still a narrow one, because although it captures the presence of insect and related species in rivers, it doesn't capture biodiversity more generally. It also doesn't consider the abundance of species. 

The sixth measure (BDA6 - Drinking Water) is "An indicator of the average improvement (higher BDA) or deterioration (lower BDA) in drinking water quality in a region based on periodic water testing". This measure is not a stock, it is a flow. It is a change over time, which gives no indication of the stock available for businesses to use in production. Since Griffin and Lubberink are interested in natural capital as a stock, it would have been better to use the level of drinking water quality, rather than the change in drinking water quality over time. This measure also has problems of reverse causality. Griffin and Lubberink use their measures as if they are business inputs. However, water quality is likely an output of business. Consider a dairy farm that reduces the water quality in a nearby stream. They have the causal relationship backwards when this variable is included in the analysis.

The seventh measure (BDA7 - Plant Diseases) is "1 - percentage of plant diseases in an SA-unit compared to all possible plant diseases". Let's put aside the impossibility of measuring "all possible plant diseases". This might be a useful measure of (the lack of) biodiversity, but it would be better to directly measure plant biodiversity, rather than proxying for it by plant diseases.

The eighth measure (BDA8 - Matauranga) is "Percentage of SA2 population of Māori descent". This is a socio-cultural proxy for relationships with nature, not a measure of biodiversity.

The ninth measure (BDA9 - Population Density) is "1 - the rank of the population density in an SA2 divided by the total number of SA2 observations". Again, it isn't clear why the rank is used here, rather than actual population density. Also, like BDA2 this is a measure of urbanicity, not biodiversity.

The tenth measure (BDA10 - Possum Count) is "1 - the rank of the possum count in an SA2 divided by the total number of SA2 observations". Again, it isn't clear why the rank is used here, rather than some standardised measure of the actual possum count, or possums per land area. It is an indicator of biodiversity though, since more possums would typically mean fewer of other species.

Finally, the eleventh measure (BDA11 - Non-Drought Probability) is "1 minus the ratio of the number of drought weather events in an SA divided by the sum of the number of drought plus non-drought weather events in an SA2". It's not clear what a 'non-drought weather event' is, or why this is a sensible measure. This measure is probably correlated with the climate change measures in BDA4 in any case.

So, across the eleven (or twelve, if you treat the two BDA4 measures as separate) BDA measures, there are only three that are really measures of biodiversity, and there are a few that are likely to meaningfully correlated with biodiversity. The issue is not that every variable must be a perfect direct measure of biodiversity. Empirical research often relies on proxy measures. The issue is that the interpretation should match the proxy. A variable that measures urbanicity, business density, ethnicity, or weather anomalies may be related to biodiversity, but it is not itself biodiversity. If those variables are then combined into a single measure of 'natural capital', the interpretation becomes difficult. The estimated relationship may reflect biodiversity, but it may also reflect a mix of urbanicity, industry mix, infrastructure, climate, or demographic composition. Conflating urbanicity with biodiversity is an especially clear problem for Griffin and Lubberink's analysis, given that they multiply their BDA measures by SA2 land area when constructing their overall measure of natural capital, as I noted earlier.

Finally, Griffin and Lubberink attempt to exploit what they describe as a quasi-natural experiment. The idea is that a number of government policy changes in 2016 and 2017 were intended to improve the environment. If these policies successfully increased biodiversity, then the relationship between biodiversity and business output should become stronger after those policies were implemented. However, this is not a particularly convincing identification strategy. The policies were national, so there is no obvious untreated control group within New Zealand. The test is essentially asking whether the relationship between natural capital and business output changed after 2016 or 2017. But many other things could also have changed around the same time, including macroeconomic conditions, industry conditions, investment decisions, business confidence, and local economic trends. Moreover, the policies themselves may have affected firms through channels other than biodiversity, not least through expectations about future policy changes. That makes it difficult to interpret any post-2016 or post-2017 change as evidence that biodiversity caused higher business productivity. This part of the analysis instead shows that the estimated association between natural capital and business output is not stable over time, and that might be due to policy changes or any number of other reasons.

There are other issues that I could pick out as well, such as not including SA2 fixed effects in their analysis (so that time-invariant differences between SA2s are not controlled for). To be fair, including SA2 fixed effects would absorb much of the cross-sectional variation in biodiversity that the authors are trying to use. But that is exactly the problem, because without SA2 fixed effects, the estimates may reflect other time-invariant differences between SA2s, and not differences in biodiversity.

The overall takeaway from this paper is not that correlation is not the same as causation, it is that if you want to demonstrate correlation, you first need to use the right data in the right way. Biodiversity might be good for business. Business might be good for biodiversity. This research doesn't convincingly estimate the relationship between biodiversity and business output.

Tuesday, 2 June 2026

Genshin Impacts on Chinese trade

During the pandemic, when people were isolated at home, some people discovered a passion for sourdough. Others picked up a book. But plenty of people got (more) heavily into gaming. In late 2020, Genshin Impact was launched into that environment, and immediately exploded in popularity despite being released by a Chinese gaming studio little known to Western gamers. The interesting thing about Genshin Impact is that it doesn't 'Westernise' its Chinese foundations, and through that it may have opened a window to Chinese culture that many Western gamers wouldn't otherwise have noticed.

What effect, if any, did this have? That is essentially the question that this new article by Tianyu Wang (Jiangsu Provincial Academy of Social Sciences) and co-authors, published in the journal China Economic Review (sorry I don't see an ungated version online), tries to answer. Specifically, they look at the impact on Chinese exports, using a difference-in-differences (DiD) strategy. This involves comparing trade between China and countries with more, or less, exposure to Genshin Impact, between the period before and after its release (which they set as October 2020, the first full month after the open beta of Genshin Impact was released on 28 September 2020). Their data is monthly export data from China to other countries, from the UN Comtrade database.

However, there are a couple of oddities with the analysis. First, Wang et al. control for a variety of variables in their regression model. However, two of the variables they control for are the log of GDP and the log of GDP per capita. Because their model is a log-linear model, this means that they are unnecessarily controlling for GDP twice. To see why, consider this equation:

lnY = a + blnX + cln[X/Z]

You can think of X as GDP and Z as population, so X/Z is GDP per capita. Since ln[X/Z] is equal to [lnX - lnZ], that equation is really:

lnY = a + blnX + clnX - clnZ = a + [b+c]lnX - clnZ

So, the coefficients on both GDP and GDP per capita are not directly interpretable and a bit awkward. The coefficient on log GDP per capita in their model is actually the negative of a coefficient on log population, while the coefficient on log GDP is incorrect. Fortunately though, this just adds unnecessary complexity to their model. It doesn't bias the coefficients in the rest of the model.

Second, Wang et al. use Google Trends data as the treatment variable. This seems appropriate, because Google Trends will pick up differences in cross-country interest in Genshin Impact. Specifically, they create a Google Trends Index (GTI) that captures the search intensity for their term of interest. However, in their main analysis, they don't use a GTI based on searches for 'Genshin Impact'. Instead, they use a GTI based on searches for 'Sony'. Their explanation for that is:

There is evidence indicating that Sony and miHoYo maintain a very close relationship, and that Sony has played an important role in the global promotion of Genshin Impact.

They also say that:

...regressing China's exports directly on Genshin Impact GTI is highly endogenous...

Both of those statements may be true, and Wang et al. provide a variety of evidence in support of the close relationship between Sony and Genshin Impact. However, they don't provide similar evidence for why searches for 'Genshin Impact' would be endogenous in a way that searches for 'Sony' wouldn't. One possibility is that they are worried that search intensity for 'Genshin Impact' is correlated with countries' pre-existing closeness to China, or with pre-existing interest in Chinese cultural products. A difference-in-differences strategy, especially one that controls for country-level differences in pre-treatment trade, should already be controlling for those issues. However, time-varying shocks that are correlated with both Genshin Impact searches and Chinese exports after 2020 would remain. For example, the Genshin Impact GTI would also capture changes in favourability of views towards China that change for reasons other than Genshin Impact. Using the 'Sony' GTI may therefore reduce one problem, but it also introduces another, since Sony searches could reflect many things unrelated to Genshin Impact or China.

Fortunately, Wang et al. do report results based on the GTI for 'Genshin Impact' in their online appendix, and the results are not so different from what they get with the 'Sony' GTI. Apparently, this was suggested by one of the journal reviewers. Honestly, I think the results based on the 'Genshin Impact' GTI are the more plausible results, so I'm going to focus on them. And in those results, reported in Table D6 in the online appendix, they find that following the open beta release of Genshin Impact, every one-unit higher GTI for 'Genshin Impact' for a country is associated with a 0.215 percent increase in exports from China to that country. Unfortunately, they don't report the summary statistics for the 'Genshin Impact' GTI, so it is difficult to interpret. It is also difficult to interpret because the GTI is a normalised measure of search intensity relative to all Google searches in a given country and period. However, for comparison, the effect using the 'Genshin Impact' GTI is slightly larger than what they report for the 'Sony' GTI, which is a 0.186 percent increase in exports for each one-unit higher 'Sony' GTI.

Either way, the results suggest that countries where Genshin Impact was a bigger phenomenon experienced larger increases in exports from China than countries where Genshin Impact was less impactful. Wang et al. then turn to the mechanisms that might explain this change, using Pew Global Trends and Attitudes data. They report that:

Although we do not find evidence that Genshin Impact improved favorable perceptions of China, we do find evidence that it reduced unfavorable perceptions. This effect is primarily driven by a decline in mild aversion; there is no significant change in strong aversion. This result is intuitive—individuals who strongly dislike China are unlikely to revise their views solely because of a video game.

They also find that media narratives became more positive following Genshin Impact's release, for countries where the 'Sony' GTI was higher. However, this result is only suggestive as it was statistically insignificant.

One interesting final aspect of the paper is that Wang et al. used data on cultural distance to further explore the results, finding that:

...as bilateral cultural distance increases, the promotional effect of Genshin Impact on China's exports significantly diminishes.

So, Genshin Impact had a larger trade impact for countries with greater cultural similarity to China. That suggests that, while it might be an interesting narrative to suggest that Genshin Impact exposed the world to China, improving perceptions of China and increasing trade, the effect was actually concentrated on the countries that were already most similar to China.

This paper presents some interesting findings. However, it clearly isn't the last word on whether the international sharing of cultural products can have tangible effects on international trade, beyond their effects on the trade of the cultural product itself. It would be interesting to see if there are similar impacts for Korean cultural products, for example, or Bollywood movies (or Nollywood movies, for that matter).

Monday, 1 June 2026

Turkish inflation drives consumers to incur extreme shoe-leather costs

Inflation imposes costs on people. One of the costs of inflation is that it gives people strong incentives to spend time and effort avoiding higher prices. They can do that by reducing their cash holdings, searching harder for low prices, or, in extreme cases, travelling to shop elsewhere. When inflation is high, and prices are increasing rapidly, consumers have a strong incentive to spend a lot of time doing these things. Economists call these shoe-leather costs, because when consumers have to walk around a lot of stores in order to compare prices, their shoes wear out. At least, that's a literal explanation of the term. In an age where prices are published online, the actual act of 'walking around to compare prices' is a lot easier on the shoes. Or is it? An extreme example has been playing out recently, as reported in Bloomberg last November (paywalled, but you can find an ungated version here):

Almost every month, Cihan Citak gets into his car, passport in hand, and sets off from Istanbul to Alexandroupolis, a Greek seaside city 40 kilometers (25 miles) from the Turkish border. After a roughly four-hour drive, he walks the crowded aisles of the local supermarket, filling his cart with wine, cheese and other groceries that cost a fraction of what they do back home...

Cross-border retail has become routine for many who found that Turkey’s surging food prices and stronger lira make Greece a cheaper alternative for everyday purchases. The trend, while not new, is accelerating: 6% of all Turks crossing the border to Greece in the first nine months of the year were on a shopping run, the highest share of overall travelers since at least 2012, data from the country’s statistics agency show.

When inflation causes people to drive four hours in order to find lower prices, you know the shoe-leather costs must be high. The inflation rate in Türkiye is over 30 percent. That isn't hyper-inflation, but it is very high. For comparison in New Zealand, the inflation rate spiked at about 7 percent just after the pandemic, but that was the highest it had been in over 30 years. Inflation more recently has been between 2.5 and 3.5 percent, which is higher than the Reserve Bank's mandate to keep inflation between one and three percent in the medium to long term.

All of that is to say that Türkiye’s much higher inflation creates much stronger incentives for consumers to incur shoe-leather costs to avoid higher prices than is currently the case in New Zealand

[HT: New Zealand Herald, also paywalled]