Wednesday, 3 June 2026

This research doesn’t convincingly show that biodiversity is good for business

I was interested to read this article in The Conversation last month by Paul Griffin (University of California, Davis) and Martien Lubberink (Victoria University of Wellington), mainly because of statements like this:

...firms operating in areas with richer biodiversity are measurably more productive.

I thought, that's interesting. This might be a good example to use in class next trimester to illustrate the difference between correlation and causation. After all, the authors may be correct that firms operating in areas with richer biodiversity are more productive (correlation), but that doesn't mean that biodiversity increases productivity (causation).

And then I read the paper that The Conversation article was based on. And at that point, I decided that I shouldn't use this as an example of the difference between correlation and causation, because even the correlations that they find are shaky at best.

The approach that Griffin and Lubberink take is to look at the relationship between measures of business output and measures of biodiversity. Their measure of business output is sales or gross profit, taken from Stats NZ's Longitudinal Business Database. They generally interpret this as a measure of productivity. And that is the first problem with the paper. Sales can be interpreted as gross revenue, and in some contexts sales may be used as a rough measure of gross output. But sales are not a good measure of productivity, and is not a good measure of the economic value created by a business. The more appropriate measure would be value added, or at least something closer to profit. To see why, consider two firms that both produce a product that sells for $1,000 per unit, and both firms sell 1,000 units per month. Both firms have sales of $1 million per month. Firm A buys the product wholesale at a cost of $800 per unit, then adds a mark-up. The value added of Firm A is $200,000 per month. Firm B buys raw materials of $200 per unit, adds labour of $300 per unit, and then sells the product. The value added of Firm B is $500,000 per month. Firm B creates a lot more economic value than Firm A, and yet measured by sales they are the same. Sales are therefore a poor measure of productivity. Gross profit is less problematic, because it subtracts at least some intermediate input costs, but even gross profit is not a pure measure of value added or productivity.

As a measure of biodiversity, or more accurately as a set of proxies for biodiversity-related conditions and pressures, Griffin and Lubberink use a variety of indicators that they call 'biodiversity abundance markers' (which for some reason they use the acronym BDAs to represent). They aggregate data from a range of sources for their various BDAs (which I will discuss in further detail below), with the data at the SA2 level (SA2s are geographical areas approximately the size of suburbs in urban areas, and larger in rural or remote areas). They note that:

For each SA2, we define a vector of “biodiversity abundance markers” (BDAs), where each ranges from 0 to 100. We denote these ranks as BDA1, BDA2, … , BDAm. We then assign them to an SA2 and, therefore, to the businesses and employees in the same SA2. For a given BDA in an SA2, BDAm = 0 means complete biodiversity loss (high pressure from biodiversity loss) for marker mBDAm = 100 (low pressure from biodiversity loss) is equivalent to an SA2 with an undisturbed or fully intact natural state.

So far, so good. The only issue with that approach is that the measures of biodiversity don't have a natural interpretation, because they are just an index. But we often work with indices - you just need to be cautious about how you interpret the magnitude of the effects. Griffin and Lubberink start by showing the correlation between each of their BDA measures and their measures of business output.

However, then they want to create an overall index of biodiversity, and to do this they:

...multiply each BDA by its SA2 land area and denote the result as an empirical proxy for the natural capital (n) of an SA2 applicable to the businesses operating therein.

Remember that the BDA is an index, bounded between 0 and 100, and it has no natural interpretation in terms of magnitude. So, multiplying the index by the land area of the SA2 is not meaningful, because the BDA is not a measured biodiversity stock per square kilometre. I guess it might make sense if you wanted to calculate a weighted average index, where the weights are based on SA2 land areas, but that isn't what Griffin and Lubberink are doing. Their approach is problematic because it mechanically causes the measured biodiversity to be higher in rural areas ceteris paribus (holding all else equal), where SA2s are larger, and lower in urban areas, where SA2s are smaller. Within urban areas, ceteris paribus it causes higher measured biodiversity in industrial and commercial areas, where SA2s are larger, and lower in residential areas, where SA2s are smaller.

Griffin and Lubberink then aggregate their index-multiplied-by-land-area measures in various ways. The aggregation approach they adopt is fine, but when you aggregate numbers that are not individually meaningful, the result is not meaningful either.

But let's take a step back, because there is another problem. Griffin and Lubberink pitch their analysis as based on a Cobb-Douglas production function. That is fine - a Cobb-Douglas function is a way of relating inputs to output. We already know that their measure of output is faulty. Their inputs are also faulty. Their three-factor Cobb-Douglas function includes inputs of financial capital, human capital, and natural capital.

Griffin and Lubberink measure human capital as the number of employees working in business units in an SA2. That is really a measure of labour input, not human capital. To measure human capital (as well as labour), it would be better to also consider the education level of those employees, since more educated (not to mention more experienced) employees have more human capital. So, their measure is unlikely to pick up the important variation in human capital across SA2s, but it will pick up differences in labour input. But as a measure of combined labour and human capital, their measure will bias downwards measured human capital in urban areas, where education levels are highest, and bias upwards measured human capital in rural and remote areas, where education levels are lowest.

Griffin and Lubberink measure financial capital by the number of business units operating in an SA2. That is not financial capital. That is business density. The relationship between the number of firms and financial capital is not straightforward. An SA2 might have lots of small firms that have low aggregate financial capital, or one large firm that has a lot of financial capital.

Finally, we come back to natural capital, which is measured as noted above. However, some of the measures of biodiversity that Griffin and Lubberink use are better suited than others as a measure of natural capital. The definition of capital is important here - capital is stored up resources that can be used to produce things. Financial capital is stored up savings that can be used in the future. Human capital is stored up education and experience that can be used in the future. So, capital is a stock. It is not a flow.

Now, let's consider the BDA measures one-by one. The first (BDA1 - Land Use) is "1 - the ratio of the number of agriculture and forestry business (primary industry) units in an SA2 to the total number of business units in an SA2". This is not really a measure of land use, because it isn't measured in terms of land. The relative size of the businesses is not taken into account, so many small farms would increase this measure compared to fewer large farms. It is also difficult to see how this is a measure of biodiversity.

The second measure (BDA2 - Infrastructure) is "1 - the rank of the number of business units in an SA2 to the land area in km2 of an SA2 divided by the total number of SA2 observations". It is difficult to understand why this BDA is measured as a rank, whereas BDA1 was not. It is also difficult to see how the number of firms is a measure of infrastructure, or how it relates to biodiversity. This measure will tend to be lower in urban areas, where many small businesses are clustered, than in rural areas. So, this is likely just a measure of urbanicity, not a measure of infrastructure or biodiversity.

The third measure (BDA3 - Mining) is "1 - ratio of the number of mining business units in an SA2 to the total number of business units in an SA2", Like BDA1, this doesn't account for the size of the mines. If you have a small quarry, that counts the same in this measure as the enormous Martha Mine in Waihi. It is more plausibly a measure of (negative) biodiversity than the other measures though. Or at least it would be, if the size of the businesses were taken into account.

The fourth measure is climate change in two forms (BDA4a - Climate Change, and BDA4b Heat Spell Anomaly), which are measured as "the sum of the presence of a heat spell, cold spell, rain spell, or wind spell in an SA2 divided by 4" and "the rank of the heat spell anomalies in an SA2 divided by the total number of SA2 observations". They measure heat spells, cold spells, rain spells, and wind spells as the number of days on which the measured variable (temperature, rain, or wind) falls above (or below, for cold spells) the 'rolling mean 95th percentile' (it isn't clear what the term 'rolling mean 95th percentile' actually means). It isn't clear why adding those four up makes any sense, but perhaps you could just label them weather anomalies. In the second form of this measure, like BDA2 it isn't clear why the rank is used when the actual number of heat spells could be used instead. Again, this isn't really a direct measure of biodiversity, but to the extent that weather anomalies impede biodiversity, it may be a reasonable proxy.

The fifth measure (BDA5 - River Diversity) is "River condition × 100, where River condition = Percentage of insect and related species in an SA-located river compared to all possible species". This is probably the clearest actual biodiversity measure in the paper. However, it is still a narrow one, because although it captures the presence of insect and related species in rivers, it doesn't capture biodiversity more generally. It also doesn't consider the abundance of species. 

The sixth measure (BDA6 - Drinking Water) is "An indicator of the average improvement (higher BDA) or deterioration (lower BDA) in drinking water quality in a region based on periodic water testing". This measure is not a stock, it is a flow. It is a change over time, which gives no indication of the stock available for businesses to use in production. Since Griffin and Lubberink are interested in natural capital as a stock, it would have been better to use the level of drinking water quality, rather than the change in drinking water quality over time. This measure also has problems of reverse causality. Griffin and Lubberink use their measures as if they are business inputs. However, water quality is likely an output of business. Consider a dairy farm that reduces the water quality in a nearby stream. They have the causal relationship backwards when this variable is included in the analysis.

The seventh measure (BDA7 - Plant Diseases) is "1 - percentage of plant diseases in an SA-unit compared to all possible plant diseases". Let's put aside the impossibility of measuring "all possible plant diseases". This might be a useful measure of (the lack of) biodiversity, but it would be better to directly measure plant biodiversity, rather than proxying for it by plant diseases.

The eighth measure (BDA8 - Matauranga) is "Percentage of SA2 population of Māori descent". This is a socio-cultural proxy for relationships with nature, not a measure of biodiversity.

The ninth measure (BDA9 - Population Density) is "1 - the rank of the population density in an SA2 divided by the total number of SA2 observations". Again, it isn't clear why the rank is used here, rather than actual population density. Also, like BDA2 this is a measure of urbanicity, not biodiversity.

The tenth measure (BDA10 - Possum Count) is "1 - the rank of the possum count in an SA2 divided by the total number of SA2 observations". Again, it isn't clear why the rank is used here, rather than some standardised measure of the actual possum count, or possums per land area. It is an indicator of biodiversity though, since more possums would typically mean fewer of other species.

Finally, the eleventh measure (BDA11 - Non-Drought Probability) is "1 minus the ratio of the number of drought weather events in an SA divided by the sum of the number of drought plus non-drought weather events in an SA2". It's not clear what a 'non-drought weather event' is, or why this is a sensible measure. This measure is probably correlated with the climate change measures in BDA4 in any case.

So, across the eleven (or twelve, if you treat the two BDA4 measures as separate) BDA measures, there are only three that are really measures of biodiversity, and there are a few that are likely to meaningfully correlated with biodiversity. The issue is not that every variable must be a perfect direct measure of biodiversity. Empirical research often relies on proxy measures. The issue is that the interpretation should match the proxy. A variable that measures urbanicity, business density, ethnicity, or weather anomalies may be related to biodiversity, but it is not itself biodiversity. If those variables are then combined into a single measure of 'natural capital', the interpretation becomes difficult. The estimated relationship may reflect biodiversity, but it may also reflect a mix of urbanicity, industry mix, infrastructure, climate, or demographic composition. Conflating urbanicity with biodiversity is an especially clear problem for Griffin and Lubberink's analysis, given that they multiply their BDA measures by SA2 land area when constructing their overall measure of natural capital, as I noted earlier.

Finally, Griffin and Lubberink attempt to exploit what they describe as a quasi-natural experiment. The idea is that a number of government policy changes in 2016 and 2017 were intended to improve the environment. If these policies successfully increased biodiversity, then the relationship between biodiversity and business output should become stronger after those policies were implemented. However, this is not a particularly convincing identification strategy. The policies were national, so there is no obvious untreated control group within New Zealand. The test is essentially asking whether the relationship between natural capital and business output changed after 2016 or 2017. But many other things could also have changed around the same time, including macroeconomic conditions, industry conditions, investment decisions, business confidence, and local economic trends. Moreover, the policies themselves may have affected firms through channels other than biodiversity, not least through expectations about future policy changes. That makes it difficult to interpret any post-2016 or post-2017 change as evidence that biodiversity caused higher business productivity. This part of the analysis instead shows that the estimated association between natural capital and business output is not stable over time, and that might be due to policy changes or any number of other reasons.

There are other issues that I could pick out as well, such as not including SA2 fixed effects in their analysis (so that time-invariant differences between SA2s are not controlled for). To be fair, including SA2 fixed effects would absorb much of the cross-sectional variation in biodiversity that the authors are trying to use. But that is exactly the problem, because without SA2 fixed effects, the estimates may reflect other time-invariant differences between SA2s, and not differences in biodiversity.

The overall takeaway from this paper is not that correlation is not the same as causation, it is that if you want to demonstrate correlation, you first need to use the right data in the right way. Biodiversity might be good for business. Business might be good for biodiversity. This research doesn't convincingly estimate the relationship between biodiversity and business output.

Tuesday, 2 June 2026

Genshin Impacts on Chinese trade

During the pandemic, when people were isolated at home, some people discovered a passion for sourdough. Others picked up a book. But plenty of people got (more) heavily into gaming. In late 2020, Genshin Impact was launched into that environment, and immediately exploded in popularity despite being released by a Chinese gaming studio little known to Western gamers. The interesting thing about Genshin Impact is that it doesn't 'Westernise' its Chinese foundations, and through that it may have opened a window to Chinese culture that many Western gamers wouldn't otherwise have noticed.

What effect, if any, did this have? That is essentially the question that this new article by Tianyu Wang (Jiangsu Provincial Academy of Social Sciences) and co-authors, published in the journal China Economic Review (sorry I don't see an ungated version online), tries to answer. Specifically, they look at the impact on Chinese exports, using a difference-in-differences (DiD) strategy. This involves comparing trade between China and countries with more, or less, exposure to Genshin Impact, between the period before and after its release (which they set as October 2020, the first full month after the open beta of Genshin Impact was released on 28 September 2020). Their data is monthly export data from China to other countries, from the UN Comtrade database.

However, there are a couple of oddities with the analysis. First, Wang et al. control for a variety of variables in their regression model. However, two of the variables they control for are the log of GDP and the log of GDP per capita. Because their model is a log-linear model, this means that they are unnecessarily controlling for GDP twice. To see why, consider this equation:

lnY = a + blnX + cln[X/Z]

You can think of X as GDP and Z as population, so X/Z is GDP per capita. Since ln[X/Z] is equal to [lnX - lnZ], that equation is really:

lnY = a + blnX + clnX - clnZ = a + [b+c]lnX - clnZ

So, the coefficients on both GDP and GDP per capita are not directly interpretable and a bit awkward. The coefficient on log GDP per capita in their model is actually the negative of a coefficient on log population, while the coefficient on log GDP is incorrect. Fortunately though, this just adds unnecessary complexity to their model. It doesn't bias the coefficients in the rest of the model.

Second, Wang et al. use Google Trends data as the treatment variable. This seems appropriate, because Google Trends will pick up differences in cross-country interest in Genshin Impact. Specifically, they create a Google Trends Index (GTI) that captures the search intensity for their term of interest. However, in their main analysis, they don't use a GTI based on searches for 'Genshin Impact'. Instead, they use a GTI based on searches for 'Sony'. Their explanation for that is:

There is evidence indicating that Sony and miHoYo maintain a very close relationship, and that Sony has played an important role in the global promotion of Genshin Impact.

They also say that:

...regressing China's exports directly on Genshin Impact GTI is highly endogenous...

Both of those statements may be true, and Wang et al. provide a variety of evidence in support of the close relationship between Sony and Genshin Impact. However, they don't provide similar evidence for why searches for 'Genshin Impact' would be endogenous in a way that searches for 'Sony' wouldn't. One possibility is that they are worried that search intensity for 'Genshin Impact' is correlated with countries' pre-existing closeness to China, or with pre-existing interest in Chinese cultural products. A difference-in-differences strategy, especially one that controls for country-level differences in pre-treatment trade, should already be controlling for those issues. However, time-varying shocks that are correlated with both Genshin Impact searches and Chinese exports after 2020 would remain. For example, the Genshin Impact GTI would also capture changes in favourability of views towards China that change for reasons other than Genshin Impact. Using the 'Sony' GTI may therefore reduce one problem, but it also introduces another, since Sony searches could reflect many things unrelated to Genshin Impact or China.

Fortunately, Wang et al. do report results based on the GTI for 'Genshin Impact' in their online appendix, and the results are not so different from what they get with the 'Sony' GTI. Apparently, this was suggested by one of the journal reviewers. Honestly, I think the results based on the 'Genshin Impact' GTI are the more plausible results, so I'm going to focus on them. And in those results, reported in Table D6 in the online appendix, they find that following the open beta release of Genshin Impact, every one-unit higher GTI for 'Genshin Impact' for a country is associated with a 0.215 percent increase in exports from China to that country. Unfortunately, they don't report the summary statistics for the 'Genshin Impact' GTI, so it is difficult to interpret. It is also difficult to interpret because the GTI is a normalised measure of search intensity relative to all Google searches in a given country and period. However, for comparison, the effect using the 'Genshin Impact' GTI is slightly larger than what they report for the 'Sony' GTI, which is a 0.186 percent increase in exports for each one-unit higher 'Sony' GTI.

Either way, the results suggest that countries where Genshin Impact was a bigger phenomenon experienced larger increases in exports from China than countries where Genshin Impact was less impactful. Wang et al. then turn to the mechanisms that might explain this change, using Pew Global Trends and Attitudes data. They report that:

Although we do not find evidence that Genshin Impact improved favorable perceptions of China, we do find evidence that it reduced unfavorable perceptions. This effect is primarily driven by a decline in mild aversion; there is no significant change in strong aversion. This result is intuitive—individuals who strongly dislike China are unlikely to revise their views solely because of a video game.

They also find that media narratives became more positive following Genshin Impact's release, for countries where the 'Sony' GTI was higher. However, this result is only suggestive as it was statistically insignificant.

One interesting final aspect of the paper is that Wang et al. used data on cultural distance to further explore the results, finding that:

...as bilateral cultural distance increases, the promotional effect of Genshin Impact on China's exports significantly diminishes.

So, Genshin Impact had a larger trade impact for countries with greater cultural similarity to China. That suggests that, while it might be an interesting narrative to suggest that Genshin Impact exposed the world to China, improving perceptions of China and increasing trade, the effect was actually concentrated on the countries that were already most similar to China.

This paper presents some interesting findings. However, it clearly isn't the last word on whether the international sharing of cultural products can have tangible effects on international trade, beyond their effects on the trade of the cultural product itself. It would be interesting to see if there are similar impacts for Korean cultural products, for example, or Bollywood movies (or Nollywood movies, for that matter).

Monday, 1 June 2026

Turkish inflation drives consumers to incur extreme shoe-leather costs

Inflation imposes costs on people. One of the costs of inflation is that it gives people strong incentives to spend time and effort avoiding higher prices. They can do that by reducing their cash holdings, searching harder for low prices, or, in extreme cases, travelling to shop elsewhere. When inflation is high, and prices are increasing rapidly, consumers have a strong incentive to spend a lot of time doing these things. Economists call these shoe-leather costs, because when consumers have to walk around a lot of stores in order to compare prices, their shoes wear out. At least, that's a literal explanation of the term. In an age where prices are published online, the actual act of 'walking around to compare prices' is a lot easier on the shoes. Or is it? An extreme example has been playing out recently, as reported in Bloomberg last November (paywalled, but you can find an ungated version here):

Almost every month, Cihan Citak gets into his car, passport in hand, and sets off from Istanbul to Alexandroupolis, a Greek seaside city 40 kilometers (25 miles) from the Turkish border. After a roughly four-hour drive, he walks the crowded aisles of the local supermarket, filling his cart with wine, cheese and other groceries that cost a fraction of what they do back home...

Cross-border retail has become routine for many who found that Turkey’s surging food prices and stronger lira make Greece a cheaper alternative for everyday purchases. The trend, while not new, is accelerating: 6% of all Turks crossing the border to Greece in the first nine months of the year were on a shopping run, the highest share of overall travelers since at least 2012, data from the country’s statistics agency show.

When inflation causes people to drive four hours in order to find lower prices, you know the shoe-leather costs must be high. The inflation rate in Türkiye is over 30 percent. That isn't hyper-inflation, but it is very high. For comparison in New Zealand, the inflation rate spiked at about 7 percent just after the pandemic, but that was the highest it had been in over 30 years. Inflation more recently has been between 2.5 and 3.5 percent, which is higher than the Reserve Bank's mandate to keep inflation between one and three percent in the medium to long term.

All of that is to say that Türkiye’s much higher inflation creates much stronger incentives for consumers to incur shoe-leather costs to avoid higher prices than is currently the case in New Zealand

[HT: New Zealand Herald, also paywalled]

Friday, 29 May 2026

This week in research #128

Here's what caught my eye in research over the past week:

  • Ruggles tests Richard Easterlin's argument that the economic and social prospects of a generation are influenced by the size of the cohort relative to adjacent cohorts, and finds using US data from 1910 to 2040 that the theory fits the data well for the period from 1940 to 1980 but fails in later decades, although baby boomers exiting the labour force will likely lead to increases in wages in the future
  • de Bondt and Sun (with ungated earlier version here) use ChatGPT to classify activity sentiment scores from Purchasing Managers’ Index (PMI) news releases, then use those scores to 'nowcast' GDP, finding that on average, out-of-sample forecast accuracy improves by about 20% apart from the two most recent years
  • Skali et al. (open access) find that better-looking Swiss politicians are not more prone to rent-seeking through interest group affiliations, and do not deviate more from their voters' preferences
  • Jin, Karim, and Schulze (open access) find that Islamist terror attacks created significant negative abnormal returns in American and European markets, but the stock market effects of other terror attacks were almost nil

In other news, I wrote a quick take on the New Zealand Budget as part of The Conversation's coverage this week. That article also has a drop-down menu at the bottom that summarises the key Budget announcements in each area