Sunday, 30 April 2023

Scalping Girl Scout cookies

Ticket scalping is a favourite and recurring theme in the media (for example, see this piece on Stuff from February this year). It is also one of the areas where economists and the general public most disagree (other possibilities for that title might include the gains from trade, and rent controls). The key difference is that the general public's view is that ticket scalping is unfair and exploitative, while the view of many economists is that ticket scalping simply represents an expected market activity when the initial price is set too low.

I've written in detail about ticket scalping before (see here), so I'm not going to repeat those points. Instead, I want to note that scalping is not an activity that is limited to tickets to concerts or sporting events. It can happen whenever the price is set too low, leading to a shortage. For example, I've written before about scalping of Ontario camping sites. And now we have the example of Girl Scout cookies. As reported in the New York Times last month:

Samoas, Trefoils and Thin Mints, move over. A new Girl Scout cookie flavor, Raspberry Rally, is in such high demand that, after swiftly selling out online, boxes are now being peddled for far higher prices on resale websites.

Single boxes of the cookies, which have a crispy raspberry-flavored center coated in chocolate, cost from $4 to $7, but they are selling for as much as five times the usual price on the secondary market.

Girl Scouts of the U.S.A. has expressed dismay over the situation. The organization said in a statement that most local Girl Scout troops had sold out of the “extremely popular” Raspberry Rally cookies for the season and emphasized that it was “disappointed” to see unauthorized resales of the flavor.

“While we are happy that there’s such a strong demand for our cookies year over year,” the Girl Scouts said, “we’re saddened that the platforms and the sellers are disregarding the core mission of the cookie program and are looking to make a profit off of the name without supporting our mission and the largest girl-led entrepreneurship program in the world.”

The third-party sellers have “deprived” troops of valuable experience and of proceeds that fund “critical programming,” the organization said. The organization encouraged people to support Girl Scout troops by purchasing one of the many other available flavors.

The Girl Scouts have this exactly wrong. The resellers are not depriving the Girl Scout troops of anything. The resellers bought the cookies from the Girl Scouts legitimately, at the price that the Girl Scouts set for their cookies. If anyone deprived Girl Scout troops of proceeds to fund critical programming, it is the Girl Scouts themselves. They should have set the price higher, and they would have made more profits from the cookies. The actions of the resellers of cookies demonstrates this clearly. If the Girl Scouts had set the price higher initially, at a price that equalised the quantity demanded with the number of cookies available, there would be no profit opportunity for the resellers to exploit.

To reiterate, scalping (including the scalping of Girl Scout cookies) represents an expected market response to a good that was initially priced too low. From the article:

For more than a century, the Girl Scouts have been holding annual cookie sales to raise money for troop activities while helping scouts learn skills like marketing, goal-setting and budgeting.

Maybe they should be helping the scouts to learn about pricing as well?

[HT: Marginal Revolution]

Read more:

Friday, 28 April 2023

Excess demand for the Great Walks continues

In an opinion piece in the New Zealand Herald today, Thomas Bywater wrote:

The annual “bun fight” for bunks on the Milford and Routeburn tracks has become something of a tradition. Thousands of hopefuls log-in on opening day to try and book one of the 120 bunks on the “finest walks” in the world. Since moving to the online booking system, it’s become a bit of a lottery...

Many put the blame squarely on DoC for ruining their tramping holiday. Particularly international walkers, who said they had stayed up into the small hours of the morning to try and secure a place.

Bywater's solution to the problem is to create more Great Walks:

The only way to increase the number of bunks on the Great Walk network is to increase the number of Great Walks.

It’s a solution that the Department has only recently reached, with the addition of the Paparoa in 2019. As the fourth most well-subscribed trail on the network the West Coast trail has been a huge success.

That is only one way, not the only way, to improve things. Another is to recognise that, when there are more people wanting to buy a good or service than there is capacity to provide it, that means that there is excess demand for the good or service. Excess demand arises when the price is below the equilibrium price (the price that would equate the quantity demanded and quantity supplied of the good or service). This situation is shown in the diagram below. At the current market price for the Great Walks of P0, the quantity of huts demanded is QD, while the quantity of huts supplied (available) is QS. Since QD is greater than QS, there is excess demand (a shortage).

How do you get rid of excess demand? You allow the price to increase. If the price was P1 instead of P0, then both the quantity of huts demanded and the quantity of huts supplied would be Q1. There would be no more excess demand. Every tramper who was willing to pay P1 for a hut would get one. This is a point that I have made before (in relation to the free pricing of the Great Walks, rather than the price of huts). There are no good options for managing excess demand - either the price needs to increase, or some people are going to miss out.

Building new Great Walks is a great idea in its own right. However, it will only impact demand for the Routeburn or Milford Tracks to the extent that the new Great Walk is a substitute. That the Paparoa Track quickly because the 'fourth most well-subscribed trail' and yet we still have serious excess demand for other Great Walks doesn't provide a strong endorsement of new tracks as a solution. Instead, it is more likely that the addition of new tracks simply adds new demand to the system as well as new supply.

On the plus side, I was happy to see this bit from Bywater's article:

For the first time since the pandemic, international visitors were able to vie for a place, albeit at a higher rate than domestic visitors. From those that were able to book a place on the Milford Track, last week, 35 per cent were from overseas.

Finally, we have price discrimination that favours domestic tourists over international tourists (as I have argued for before - see here and here). Now, we just need the prices (for both domestic and international trampers) to rise some more.

Read more:

Thursday, 27 April 2023

AI may be the killer app, as well as the killer of (dating) apps

In a post a couple of weeks ago about who gains, and who loses, status as a result of ChatGPT (and other large language models), I wrote:

There is one more context I want to highlight, which is a particular favourite of mine when teaching signalling to first-year students: online dating. It is difficult for a person to signal their quality as a potential date on a dating app. Anyone can write a good profile, and use a stock photo. However, one of the few signals that might be effective is the conversation on the app before a first date. A 'good' date should be able to set themselves apart from a 'not-so-good' date, by the things they say during a conversation. However, with ChatGPT in the picture, the signalling value of what people write in dating app conversations is reduced (in contrast to the assertions in this article in The Conversation). I wonder how long it will be before we end up in a situation where one instance of ChatGPT is talking to another instance of ChatGPT, because both dating app users are using ChatGPT at the same time (it has probably happened already). Anyway, good quality dates will lose status as well.

It turns out that I was almost certainly right that ChatGPTs are talking to each other on dating apps. The Washington Post reported earlier this week (paywalled):

Coyne Lloyd, a 35-year-old tech investor, was visiting his family in Upstate New York recently when he decided to set up some dates in the city. He fired up Hinge, his preferred dating app, and swiped on a few interesting women. After receiving a couple of matches, he turned, out of curiosity, to a new AI dating tool called Rizz to break the ice...

Rizz, which is meant to function as a digital wingman, helps users come up with killer opening lines and responses to potential matches. The company behind it is just one of many start-ups trying to transform romance through artificial intelligence by optimizing and automating online dating, now one of the primary ways by which people find romantic connections...

Using dating apps can be a slog. Some people complain that they have to sift through countless matches as others indiscriminately swipe; it is difficult to start conversations with strangers; and many users end up viewing the apps more as a necessary chore than an exciting opportunity to connect with someone new...

That is what drove Dmitri Mirakyan, 28, a data scientist in New York, to develop YourMove.ai, an AI dating tool that helps users begin and respond to messages. “This past summer, I got really tired of sifting through and trying to come up with responses on dating apps,” he said. “So I tried to see if GPT3 could flirt. It turns out it could. A month later, I built the first version [of the platform] on a Saturday.”

Using dating apps is costly. It takes time and effort. However, it is precisely the time and effort involved that makes messaging in a dating app a signal of a date's quality. The messaging function in a dating app provided an opportunity for high-quality dates to signal that they were high quality. High-quality dates are willing to put in the time and effort to write appealing messages. Low-quality dates are not as willing to put in the time and effort. If you eliminate the time and effort required in messaging through a dating app, you eliminate the value of the signal. You end up back in a pooling equilibrium, where anyone you exchange messages with in the app is just as likely to be low-quality (and using AI) as they are to be high-quality. Or worse. Maybe the availability of AI for dating apps crowds out the high-quality dates, leaving only low-quality dates behind. In that case, every person you exchange messages with in the app is low quality.

It is difficult to see how dating apps can restore the previous separating equilibrium, that in-app messaging allowed for (separating the high-quality dates from the low-quality dates). I suggested in my earlier post that in-person meet-ups could become even more important as a tool for sorting out high-quality dates from low-quality dates. That is costly as well, and in an age where ChatGPT can be projected onto a pair of glasses in real time, even the conversation in a real life meet-up may not provide a good signal.

AI may be the killer app of the moment, but it also appears that it may be the killer of (dating) apps. However, markets do adapt. It will be interesting to see how the dating market adapts.

[HT: Marginal Revolution]

Read more:

Tuesday, 25 April 2023

Reason to be cautious with the inverse hyperbolic sine transformation

Trigger warning: This post is more technical than my usual posts.

Economists often transform data (on incomes, for example) by taking logarithms. This has statistical advantages, in terms of making the distribution of otherwise skewed variables behave better in the analysis. It also has a neat property in terms of the interpretation of regression coefficients, because in a log-linear model (where the dependent variable is measured in logs and the explanatory variable is not) the coefficient can be interpreted as a percentage change, and in a log-log model (where both the dependent and explanatory variables are measured in logs) the coefficient is an elasticity.

However, there is a problem with the log transformation. Any value of zero (or a negative number) is undefined, and this makes some analyses challenging. For example, in gravity models of trade or migration, small areas that are far apart may have zero flows between them. Since the gravity model relies on logs of trade or migration flows, the zero values cause a problem. Or, if you want to estimate the effect of some programme for underemployed youth on employment income, you would often use the log of income as the dependent variable. However, unemployed people may have zero reported income, and those zero values cause a problem.

There are few good ways of dealing with the problem of zeroes or negative values in a variable that you want to log-transform. You could drop all negative or zero values, but that decreases the sample size and likely biases your results (because observations that have zeroes or negative values are usually different in meaningful ways from those that have positive non-zero values). Another option is to compute ln(X+1) rather than ln(X) when log-transforming the variable X. That deals with zeroes, but not large negative numbers, and it also biases the results (but probably not as much as simply dropping data would).

An alternative transformation that has gained some traction in recent years is the inverse hyperbolic sine (asinh) transformation. That transformation involves computing the equation asinh(X) = ln(X+(X^2+1)^(1/2)), which is actually not quite as complicated as it seems. It deals with variables with zero values (but not large negative values). Moreover, it has been argued that coefficients on variables transformed in this way have the same interpretations as variables that have been log-transformed.

However, all may not be as rosy as it seems. This blog post by David McKenzie at the Development Impact blog suggests that we should be much more cautious with the asinh transformation. The post draws on a variety of recent articles and working papers that have investigated the asinh transformation and its properties. The first problem is that it seems that it is really sensitive to the units of measurement, such that measuring in dollars can result in different coefficient estimates than measuring in thousands or millions of dollars. That should not be the case when the coefficient is supposed to be interpreted as a percentage or an elasticity!

The kicker may be this bit:

Chen and Roth re-estimate 10 papers published in the AER that used the i.h.s transformation for at least one outcome, and illustrate how re-scaling the outcome units by 100 can lead to a change of more than 100% in the estimated treatment effect – with the largest changes coming for programs that had impacts on the extensive margin. E.g. In Rogall (2021)’s work on the Rwandan genocide, he looks at how the presence of armed groups fosters civilian participation in the violence. The extensive margin effect is 0.195, so a big extensive margin change. The estimated treatment effect then changes from 1.248 to 2.15 depending on whether y or 100*y is used as the outcome – which implies a massive change in the implied percentage change effect if interpreting these as either log points or like a log variable.

The Chen and Roth working paper that McKenzie refers to is available here. Given how often this transformation has been used in recent times, I had recently added it to my personal econometrics cheat sheet. However, I haven't felt the need to use it in my own work as yet (because, in gravity models for example, we tend to use Poisson pseudo-maximum likelihood (PPML), which deals with zero values better than the alternatives to log-transformation). I've now had to go back and footnote my cheat sheet with a cautionary note.

And that is probably the takeaway from McKenzie's post (and the papers he cites there), although he does provide some suggested ways of proceeding (adapted from the Chen and Roth working paper). I prefer to just suggest that when we use the asinh transformation, we need to be cautious.

Monday, 24 April 2023

No rain in Spain leads to olive oil pain

The Financial Times reported yesterday (paywalled):

A lack of rain in Spain has pushed prices for olive oil to record levels, with analysts warning that a particularly dry summer could lead to even lower crop yields later this year.

Olive oil prices have surged almost 60 per cent since June to roughly €5.4 per kilogramme, on the back of a severe drought in Europe that last year ruined olive crops across the continent.

Spain, the largest olive oil producer, was hit particularly hard. The country’s farmers typically produce half of the world’s olive oil, though annual supplies have roughly halved to about 780,000 tonnes in the past 12 months.

This provides a timely example, given that my ECONS101 class has been covering the model of demand and supply this week. Consider the effect of the lack of rain on the market for olives in Spain, shown in the diagram below (we'll come to the market for olive oil a bit later). The olive market was initially in equilibrium, where demand D0 meets supply S0, with a price of P0 and Q0 olives are traded. A lack of rain in Spain reduces the amount of olives available to harvest, decreasing supply to S1. This increases the equilibrium price of olives to P1, and reduces the quantity of olives traded to Q1.

Now consider what that means for the market for olive oil. Olives are the main input into the production of olive oil (duh!). The price of olives has increased, which makes producing olive oil more costly. Higher production costs reduce the supply of olive oil. The diagram for the market for olive oil looks the same as the market for olives shown above - a decrease in supply, leading to an increase in price (to a price of €5.4 per kg in the article), and a decrease in the quantity of olive oil traded (to 780,000 tonnes in the article).

So, Spain's lack of rain will be passed onto olive oil consumers, in the form of higher prices.

Sunday, 23 April 2023

Take care with ILO's broken labour force data series

One of the first rules of working with real-world data is to graph it. That allows us to see where the data has weird inconsistencies, such as those described in this blog post by Kathleen Beegle over on the Development Impact blog:

Has the share of women in the labor force in Rwanda fallen from 84% in 2014 to 52% in 2019? I highly doubt it. More likely, we are seeing the consequence of a major change in the internationally agreed-upon statistical definition of employment. Yes, really. It is quite likely to be a change of which mainly statistical-type-super-data-nerd economists would be aware. But it is one that all of us might want or need to be aware of, especially if you want to properly use country statistics and/or benchmark your own surveys with estimates from national statistical agencies.

In 2013, the 19th International Conference of Labour Statisticians (ICLS) redefined several key labor statistics. It’s taken a few years for these new definitions/concepts to get integrated into questionnaires, into survey efforts, and, as I suspect above, to show up in country statistics. The ICLS19 made several changes but here I focus on one specific change: employment is now work for pay or profit. But wasn’t that was it was before? Not quite. A key feature to this change is that work that is mainly intended for “own-use production” is now excluded, where it was counted as employed before. Before ICLS19, production of primary products, whether for market or household consumption, was counted as employment. The new definition means that someone farming mainly for family consumption (i.e. subsistence farmers) is no longer “employed”. (Though they could still be employed by the new definition if they have another job that qualifies, and in the labor force if they being available and actively searching for work in the form of pay or profit). So subsistence farmers (or those otherwise growing crops mainly intended for home consumption) are “working”, but not “employed”. I put quotes to emphasize these specific terms.

Beegle identifies an interesting phenomenon, and one that we should be careful about - statistical agencies changing the definition of variables that we use. Our analyses would be confounded by these definitional changes if we don't account for them in some way. Now, if all statistical agencies applied the new definition at the same time, that would be easy, but it appears they haven't. Looking at the World Development Indicators, here's the data for five countries: (1) Rwanda; (2) Niger; (3) Papua New Guinea; (4) Benin; and (5) Cameroon (original data are here).

I deliberately chose these five countries as they all seem to experience a similar transition from a high steady-state labour force participation rate to a lower steady-state labour force participation rate. The problem is that all five countries make this transition at different times. So, the usual ways that we would deal with a break in the time series (such as by using a dummy variable for before/after the change in definition, or a before/after dummy variable interacted with a dummy variable for each country) simply isn't going to work well.

This graph also highlights something else, which is no doubt a feature of the underlying ILOSTAT data. Each transition is far too smooth. It is like a straight line has been drawn from the initial steady state series to the start of the new series. No doubt this period of linear transition is actually masking missing data between two consecutive labour force surveys.

We would need to take great care when using the annual data, because there is a great degree of measurement error created by the change in definition, as well as the way the data series have been smoothed between each labour force survey. This is a timely reminder of why the first step in any data analysis should be to graph the data, to help us understand what we are working with.

Saturday, 22 April 2023

Book review: The Voltage Effect

Some ideas (in business, policy, or other fields) work well initially, but fail spectacularly when they are rolled out more widely. Other ideas work fantastically well. What determines the success or failure of an idea to scale? That is the question underlying the wonderful book The Voltage Effect, by University of Chicago economist John List.

List has had an incredibly interesting and varied career in economics. After completing his PhD in economics at the University of Wyoming, List was hired by the University of Central Florida (the only programme that gave him an offer). He worked his way from there to eventually land at the highly ranked Economics Department at the University of Chicago in 2004. More recently, alongside his academic appointment, he has been Chief Economist at both Uber and Lyft (and, since 2022, has been Chief Economist for Walmart, although that appointment occurred after this book was written). List draws on all of that experience, which includes an incredibly varied array of field experimental research, to populate the book with an array of engaging examples. I especially loved the stories that came from List's own life, including his experiences as a forklift driver at a Wisconsin cheesemaker, and as a collegiate golf player.

The purpose of the book, though, is to outline List's (evidence-based) view of what contributes to the success, or failure, of an idea to scale. This is nicely summarised in the last section of the book, which talks about what Jared Diamond referred to in his book Guns, Germs and Steel as the 'Anna Karenina principle': that any of a number of deficiencies could lead an idea to fail. Or, as List writes:

The secret to scaling isn't having any one silver bullet. There are multiple ways an idea can fail at scale, and to achieve high voltage, you must check each of the Five Vital Signs: false positives, misjudging the representativeness of an initial population or situation, spillovers, and prohibitive costs. Any one of these alone can sink your ship.

Now, rather than simply lay out how ideas fail to scale, List also helpfully presents several strategies that can be employed to help ideas to scale:

Once you clear these five hurdles, however, there is more you can do to improve your probability of success at scale. You can design the right incentives, use marginal thinking to make the most of your resources, and stay lean and effective as you grow. You can make decisions based on the opportunity cost of your time, discover your comparative advantage, and learn to optimally quit, allowing you to unapologetically cut your losses and move on to new and better ideas when appropriate. And you can build a diverse and dynamic organizational culture based on trust and cooperation, rather than competition and individualism.

Economics students would recognise a lot in those strategies that comes from basic economic principles. That should be no surprise, given List's disciplinary background, and these are all economic principles that are underappreciated in the general population.

Even aside from some basic economic principles, there is a lot to learn from the book, especially from the evidence embedded within the examples that List uses. For example, reducing class sizes in schools, as recently suggested by the New Zealand government, may fail to scale. That's because it requires the hiring of more teachers, and the new teachers are likely to be less effective teachers than those who are already teaching. That's where 'marginal thinking' becomes important.

I really enjoyed reading this book. List writes in an engaging style that is easy to read, which is uncommon among top economists (and I've previously noted List as one of my favourites for a future Nobel Prize - see here and here). This book should be required reading for businesspeople and policymakers, who are thinking about scaling their ideas. It is also a great read for economics and business students more generally. Recommended!

Friday, 21 April 2023

Ethnic discrimination among young children

In yesterday's post, I discussed the idea of taste-based discrimination. Such discrimination is effectively a prejudice against members of particular population groups (such as a particular ethnic group or gender). On that theme, I recently read two research articles that looked at ethnic discrimination among children.

The first article was this one by Jane Friesen (Simon Fraser University) and co-authors, published in the Journal of Economic Psychology in 2012 (ungated earlier version here). Their sample was 430 Grade 1 and 2 children (aged 5-8 years) from Vancouver. Friesen et al. ran an experiment that involved two tasks. The second task (a 'sharing task') was the most relevant to this post. The sharing task was a variant of the dictator game, which is a common experimental game used to examine norms of fairness within a population. In this case, each child got to share 12 stickers among four people (themselves, a White child from their class, an East Asian child from their class, and a South Asian child from their class). The task was repeated three times for each child. 

Looking at the results of the sharing task, Friesen et al. find that:

Overall, participants share on average 13.6/36 stickers or 38% of their endowment. South Asian children share fewer stickers overall (11.7) compared to Whites (13.8) and East Asians (13.7)...

...50% of participants chose a non-discriminatory allocation (including 5.2% of participants who shared zero stickers)... White participants were substantially more likely (55%) than East Asian (45%) and South Asian (40%) participants to choose a non-discriminatory allocation, and girls were more likely (54%) than boys (42%) to do so...

...on average, White participants shared slightly more stickers with the recipient from their own ethnic category than with the other two; East Asians participants shared slightly fewer with their own category than with the other two; and South Asian participant shared slightly fewer with the East Asian recipient than with the other two.

That provides some evidence in favour of discrimination. Going a bit further, Friesen et al. then run a regression model on their data, and find that:

...White subjects share slightly less than one-third of a sticker more on average with the White target than with either of the other two recipients... However, East Asian participants show no in-group bias; if anything, they share fewer stickers with their own group with other groups... This difference in patterns of in-group bias between Whites and East Asians is statistically significant... the relevant point estimates indicate that South Asians may show somewhat less in-group bias than Whites in terms of the number of stickers shared... but this difference is not statistically significant.

In other words, there is evidence for ethnic discrimination among White and South Asian children, but not among East Asian children. There were no differences in the degree of discrimination between girls and boys.

The second article was this one by Annika List, John List (both University of Chicago), and Anya Samek (University of Southern California), published in 2017 in the journal Economics Letters (ungated earlier version here). List et al. report on a field experiment among children aged three to five years, which also involved an application of the dictator game. In this case:

...children were matched to teddy bears or other students and decided how many of their marshmallows to send them. We unobtrusively indicated the race of the match by showing pictures of hands (lighter or darker skin color) or pictures of teddy bear paws (light or dark brown).

The choice to use teddy bears as well as human hands is novel, and explained as:

...we use the teddy bears as a control in order to rule out that preferences are driven solely by dislike for darker or lighter colors. By comparing the aversion to giving to a darker color hand person relative to darker paw teddy bear, we disentangle the role of racial discrimination from preferences for colors in children’s choices.

Based on their sample of 117 children, each completing four rounds of the dictator game, List et al. find that:

On average, white children send 0.97 marshmallows to white recipients and 1.47 marshmallows to black recipients. Similarly, Hispanic children send 1.18 marshmallows to white recipients and 1.55 marshmallows to black recipients. Alternatively, black children send more marshmallows to white recipients (1.33) relative to other blacks (0.99)...

Contrary to our expectation, we do not see a difference when comparing giving to teddy bears versus to human children. 

In other words, this study provides little evidence of ethnic discrimination among the youngest children. 

These two studies piqued my curiosity because it seems somewhat obvious that ethnic discrimination develops over the course of a person's childhood. Taken together, these two studies provide some support for that view. I was a little surprised that there wasn't already a well-established literature in this space (see this 2008 article in the journal Nature for more). However, these two studies tell us little about what contributes to discrimination among children, or how it can be prevented. That gives a lot of scope for future research in this space.

Wednesday, 19 April 2023

Price discrimination and taste-based discrimination, in the same market

Price discrimination occurs when a seller sells the same good or service to different customers for different prices, and where the difference in price doesn't arise from a difference in costs. There are lots of examples of price discrimination in the real world (see some of my posts here, and here, and here, for example). Many people misunderstand price discrimination, conflating it with the types of behaviour that the word 'discrimination' is often used to describe, like racial discrimination or gender discrimination. However, more often than not price discrimination doesn't mean that the firm is pricing differently for different genders or races (although it does happen).

To make things even more confusing, discrimination in pricing on the basis of personal characteristics does sometimes happen, and it can happen in markets that also have price discrimination (as described above) as well. Take this 2018 article by Huailu Li (Fudan University), Kevin Lang (Boston University), and Kaiwen Leong (Nanyang Technological University), published in The Economic Journal (ungated earlier version here). They investigated discrimination in the commercial sex market in the Geylang district of Singapore, using a survey of 176 sex workers and a dataset based on 814 transactions between those sex workers and their clients (taken from the last four to seven transactions for each sex worker that was surveyed). Specifically, Li et al. expected to find that:

...based on beliefs about their willingness to pay, sex workers would ask for higher prices from white clients than from Chinese clients who, in turn, would be asked for more than Bangladeshi clients... We also anticipated that they would charge high prices to Indian clients, the primary client group with darker skin tones (taste discrimination).

Notice that the first expectation there is price discrimination, in a similar way to what we often see in other markets - consumers with a higher willingness-to-pay, or with less elastic demand (less sensitivity to price), are charged a higher price than consumers with a lower willingness-to-pay or more elastic demand. Li et al. refer to this as 'statistical discrimination', since it is based on statistical differences between groups (in their willingness-to-pay for sex services). The second expectation (referred to as 'taste-based discrimination') is purely based on the preferences of the seller (it is what most people would recognise as discrimination). In relation to this expectation, Li et al. first report that:

The sex workers were asked to rate different ethnicities on a scale of 1 (dislike) to 5 (like very much) with 3 being ‘like.’ They consistently give high ratings to Chinese (4.2) and white (3.9) clients... In contrast, the Bangladeshi and Indian clients earn average ratings of 3.1 and 2.1...

Does that difference in sex worker preferences transfer into pricing differences? That is, do sex workers charge higher prices to clients from ethnic groups that they like less, and lower prices to clients from ethnic groups that they like more? Li et al. find:

...robust evidence that sex workers are less likely to approach Indians and that they are less likely to reach an agreement. We do not confirm the expectation of a higher price relative to Chinese clients; the initial prices demanded of the two ethnicities are similar, perhaps because sex workers also believe that Indian clients have a relatively low willingness to pay, a belief that would be consistent with the low offers made by Indians when they make the first offer. Consistent with our expectations, Indians pay a premium relative to Bangladeshis.

On the other hand, there is also statistical discrimination as well, because:

Relative to the base group (Chinese), the same sex worker suggests an initial price to whites with an 11% (10 log points) premium and gives Bangladeshis a 13% discount on the initial price offer, thus asking whites for almost 30% more than she asks from Bangladeshis.

So, there is both statistical discrimination (price discrimination based on perceived differences in willingness-to-pay), and taste-based discrimination, in this market. How confusing!

Tuesday, 18 April 2023

Doing qualitative research at scale?

Qualitative research usually involves much smaller sample sizes than quantitative research. That is typically a necessary response to the differences in the nature of the research. Quantitative research analyses datasets in such a way that a dataset that is ten times larger does not take ten times as long to analyse. In contrast, qualitative research takes much longer when the number of research participants is greater. In other words, quantitative research scales much easier than qualitative research does.

The main constraint that makes qualitative research less scalable than quantitative research is researcher time and effort. In qualitative research, every interview, focus group, participant observation, or whatever the unit of analysis is, must be analysed individually by the researcher. Unlike quantitative research, this process cannot be automated easily. Sure, there are tools to help with coding qualitative data, but they help with the management of the process, more so than the analysis itself. On top of that, different researchers may code the data in slightly different ways, meaning that it is not easy to increase the size of qualitative research by simply increasing the number of team members. The larger the qualitative research team, the larger the discrepancies between different coders are likely to be.

But, what if there was a way to easily scale qualitative research, allowing a smaller number of researchers (perhaps even one) to analyse a larger number of observations? Wouldn't that be great? That is the premise behind this paper, discussed in a relatively accessible way in this blog post on the Development Impact blog by Julian Ashwin and Vijayendra Rao (two of the seven co-authors of the paper). Specifically, they:

...develop a “supervised” [Natural Language Processing] method that allows open-ended interviews, and other forms of text, to be analyzed using interpretative human coding. As supervised methods require documents to be “labelled”, we use interpretative human coding to generate these labels, thus following the logic of traditional qualitative analysis as closely as possible. Briefly, a sub-sample of the transcripts of open-ended interviews are coded by a small team of trained coders who read the transcripts, decide on a “coding-tree,” and then code the transcripts using qualitative analysis software which is designed for this purpose. This human coded sub-sample is then used as a training set to predict the codes on the full, statistically representative sample. The annotated data on the “enhanced” sample is then analyzed using standard statistical analysis, correcting for the additional noise introduced by the predictions. Our method allows social scientists to analyze representative samples of open-ended qualitative interviews, and to do so by inductively creating a coding structure that emerges from a close, human reading of a sub-sample of interviews that are then used to predict codes on the larger sample. We see this as an organic extension of traditional, interpretative, human-coded qualitative analysis, but done at scale.

Natural Language Processing (NLP) is one of the cool new toys in the quantitative researcher's toolkit. It allows the analysis of "text as data" (see this paper in the Journal of Economic Literature, or this ungated earlier version, for a review). NLP models have been used in a wide range of applications, such as evaluating the effect of media sentiment on the stock market, or using web search data to estimate corruption in US cities.

Anyway, back to the paper at hand. Ashwin and Rao report that in their paper they:

...apply this method to study parents’ aspirations for their children by analyzing data from open-ended interviews conducted on a sample of approximately 2,200 Rohingya refuges and their Bangladeshi hosts in Cox’s Bazaar, Bangladesh.

The actual application itself is less important here than the method, about which they conclude that:

This illustrates the key advantage of our method – we are able to use the nuanced and detailed codes that emerge from interpretative qualitative analysis, but at a sample size that allows for statistical inference.

Now, I don't doubt that this is a very efficient way of analysing a lot of qualitative data, without the need for a huge amount of researcher time. However, I imagine that many qualitative researchers would argue that the method that this paper employs is not qualitative research at all. It simply takes qualitative data, constructs quantitative measures from it, and then applies quantitative analysis methods to it. This is a trap that many quantitative researchers fall into when faced with open-ended survey responses. Indeed, it is a trap that I have fallen into myself in the past, so now I partner with researchers with skills in applying qualitative methods when undertaking research that involves both qualitative and quantitative analyses (see here and here, for example). However, Ashwin and Rao are not oblivious to this problem:

Unsupervised NLP analysis provides too coarse of a decomposition of the text, which may not be suited to many research questions, as we show by comparing our results to those using a Structural Topic Model. This topic model shows that there are clearly differences in the language used by, for instance, hosts and refugees. However, interpreting these differences in terms of aspirations, ambition and navigational capacity is difficult. Unsupervised methods can thus uncover interesting dimensions of variation in text data, but they will often not give interpretable answers to specific research questions.

So, indeed there is still a role for qualitative researchers. At least, until ChatGPT takes over, at which time qualitative research may truly scale.

Monday, 17 April 2023

Web of Science takes up the battle against predatory publishers

Science reported last month:

Nearly two dozen journals from two of the fastest growing open-access publishers, including one of the world’s largest journals by volume, will no longer receive a key scholarly imprimatur. On 20 March, the Web of Science database said it delisted the journals along with dozens of others, stripping them of an impact factor, the citation-based measure of quality that, although controversial, carries weight with authors and institutions. The move highlights continuing debate about a business model marked by high volumes of articles, ostensibly chosen for scientific soundness rather than novelty, and the practice by some open-access publishers of recruiting large numbers of articles for guest-edited special issues...

Clarivate initially did not name any of the delisted journals or provide specific reasons. But it confirmed to Science the identities of 19 Hindawi journals and two MDPI titles after reports circulated about their removals. The MDPI journals include the International Journal of Environmental Research and Public Health, which published about 17,000 articles last year. In 2022, it had a Web of Science journal impact factor of 4.614, in the top half of all journals in the field of public health.

This is not the first time that MDPI has been singled out for dodgy publishing practices (see my post on this from 2021). Many of these journals operate a pay-to-publish model, which is antithetical to genuine and high-quality scholarly research. They incessantly spam academics with requests to join editorial boards, become a guest editor, submit papers (most often to special issues of dubious validity), and review papers.

To give you a sense of the scale of the spam, consider my own experience. I have blocked Hindawi and Frontiers from my emails, but I still receive requests from MDPI. Since the start of this year, I have received two requests to be a guest editor, six calls for papers for special issues, and seven requests to review papers. I rejected all of them. The requests to review are about one third of all of the requests to review papers that I have received this year, just from one publisher. And most of them were for articles where I am not an expert.

It is that last point that is of most concern. Reviewers are supposed to be acknowledged experts in the field. Otherwise, the quality of review is unlikely to be good, which limits the validity of the editorial process. This is compounded by the seven-day turnaround that these publishers expect from their reviewers.

Academics need to be vigilant in standing up to these predators, and in coaching their postgraduate students and junior colleagues to avoid these journals as outlets for publication. As the article linked above notes, Web of Science has stripped some journals of their impact factors, which limits their value in tenure or promotion decisions. One Chinese university has stopped counting publications with Hindawi, MDPI, and Frontiers in evaluating academic staff for career progression. The Norwegian National Publication Committee now gives no credit for publications at Frontiers. No doubt, there will be more sanctions to come. The negative signal from being associated with these journals could damage new academics' blossoming careers, unless they can successfully avoid the temptation to get involved with them.

[HT: Peter Newman]

Read more:

Saturday, 15 April 2023

What would happen if perfect personalised pricing was possible?

In my ECONS101 class, we discuss price discrimination, which comes in three forms: (1) first-degree price discrimination (or personalised pricing), which involves setting a different price for every consumer; (2) second-degree price discrimination (or menu pricing), which involves creating a menu of options for consumers to choose from; and (3) third-degree price discrimination (or group pricing), which involves setting different prices for known groups of consumers. Personalised pricing is intriguing, because if the firm knew the maximum amount that every consumer was willing to pay for the good or service, they could set that price for each consumer, thereby extracting the maximum possible amount of profit from every consumer.

To date, this sort of perfect personalised pricing remains a theoretical proposition. Although many firms collect a lot of data about their consumers, they still don't know exactly what each consumer is willing to pay. But what if they did? What other aspects of the market would then matter? Would the degree of competition in the market matter? How would a firm's competitors react?

Those are the sorts of questions addressed in this recent working paper by Patrick Kehoe, Bradley Larsen, and Elena Pastorino (all Stanford University). Specifically, they:

...take an extreme forward–looking view by supposing that personalized pricing is feasible and analyze the resulting equilibrium pricing patterns and its efficiency properties.

The paper is understandably theoretical (since perfect personalised pricing is still not possible). However, they also apply their model to eBay data on purchases of Apple and Samsung smartphones and tablets. In this context, the branding of the products matters, as does the experience nature of the goods (the consumer doesn't know what the quality of the good is until after they have purchased it. Kehoe et al. show using their theoretical model that:

...the strategic interaction among firms is complex: firms not only compete directly to attract a consumer in the current period, but also strategically manage the information flow to the consumer. Specifically, by appropriately choosing the prices for its product varieties, a firm can make a certain variety the most attractive and hence control how much is learned about a consumer’s taste for its products...

In other words, firms use pricing to obtain information about consumers' willingness to pay, and then use that information to price in the future. Would personalised pricing make consumers worse off? It seems like it should. However, using the smartphone and tablet eBay data combined with their theoretical model, Kehoe et al. find that:

...a significant fraction of consumers benefit from the introduction of price discrimination. Specifically, consumers who benefit are those with relatively similar beliefs about their tastes for Apple’s products or Samsung’s products, whereas consumers who are harmed are those who have a high taste for the products of only one firm. This latter group is more “captive” and, correspondingly, firms’ profits from these consumers are higher under discriminatory pricing than under uniform pricing.

The finding that, if personalised pricing was possible, then some consumers may actually be made better off, is somewhat surprising. However, those consumers who benefit are those who are most willing to switch products. In contrast, those who are unwilling to switch find that the price will be much higher. In my ECONS101 class, we discuss customer lock-in, and one of the ways that firms can lock customers in is through brand loyalty. Customers who are loyal to a particular brand are less willing to switch, and that may be especially the case where owning a particular brand becomes part of the consumer's identity (as is often the case for Apple users). Locked-in consumers often face higher prices, since the firm knows that those consumers will be less willing to switch to alternative products.

I hadn't considered the interaction between price discrimination and customer lock-in before, but it makes a lot of sense. It also works the other way of course. Firms will obtain a lot more information about consumer preferences from consumers who are locked into purchasing from them. Fortunately for those consumers, we are still not yet at the stage where this is any more than a theoretical possibility.

Friday, 14 April 2023

Book review: Economics for the Common Good

There is an unwritten expectation that Economics Nobel Prize winners will write a book soon after their award (if they have not done so already), expounding their great ideas or contributions to the discipline in a way that is accessible to a general audience. However, as many of these great economists have spent their careers in academic circles, not all are well equipped to connect with a generalist audience. I am reminded of the essay The Hedgehog and the Fox by the philosopher Isaiah Berlin, that draws on a quote from the ancient Greek poet Archilochus: "a fox knows many things, but a hedgehog knows one big thing". Some economists are hedgehogs, and they have made deep contributions in a single area of research (or several closely related areas). Others are foxes, with contributions across multiple different areas of research.

When it comes to economics research, the 2014 Nobel Prize winner Jean Tirole is a fox, having made contributions to our understanding and regulation of natural monopoly, platform markets (or two-sided markets), and several other research areas. So, I was looking forward to reading his 2017 book Economics for the Common Good, which I have delayed reading for far too long (in part, daunted by its 550 pages including notes). And the book starts well. Tirole summarises the aim of economics for the common good as:

Economics works toward the common good; its goal is to make the world a better place. To that end, its task is to identify the institutions and policies that will promote the common good.

The 'common good' that Tirole refers to is the collective well-being of society. There is little to argue against this as a goal for economics, or for the book. However, as a fox, Tirole knows many things, and it appears that this book is to be the venue for all of those things. So, rather than a clear and well scaffolded exposition of economics' potential contribution to the common good, from an expert in particular institutions, the book is instead a collection of several different, and tenuously related, threads, any of which could have been expanded into a book treatment on its own. Part I talks about economics and society, markets and market failures, before Part II outlines the role of the economist and some of the basic assumptions of economics. Part III discusses an institutional framework for the economy, and Part IV presents and discusses a number of macroeconomic challenges, including climate change, labour markets, the European project, finance and the 2008 financial crisis. Finally, Part V delves into industrial economics and competition policy, an area where Tirole has made the majority of his research contributions.

The problems with such a diverse book are twofold. First, Tirole isn't able to give a deep treatment to any of the particular topics. This is problematic in the early chapters, which feel like they are underexplored, and in the last part of the book, which is too technical and could easily have been explained in a more accessible way if more space was devoted to it. Second, while the common good was presented at the beginning of the book, I felt like it was less of a unifying theme than might have been expected given the title and the introduction. The book simply tried to do too many things, and would have benefited more from a focus on a more narrow aspect of the research areas that Tirole has great expertise in, such as regulation. In that sense, a book that is more like Alvin Roth's Who Gets What - and Why (which I reviewed here).

So, overall, I was a little disappointed in the book. Having said that, there were a number of highlights. Unlike many books by economists and others, Tirole does acknowledge realistic political constraints on decision making, such as:

The enthusiasm for top-down approaches originates in governments' desire to appear to be doing something to tackle climate change. Patchy but expensive initiatives that are visible to voters but concealed from consumers (because they are included in feed-in tariffs imposed on electric utilities or in the price of goods and services) are politically less costly than a carbon tax, which is very visible to those who have to pay it. Subsidies are always more popular than taxation, even if, in the end, someone has to pick up the bill for them.

I also really appreciated Tirole's views on government and the market:

This analysis shows that the market and the state are not alternatives but, on the contrary, are mutually dependent. The proper functioning of the market depends on the proper functioning of the state. Conversely, a defective state can neither contribute to the market's efficiency nor offer an alternative to it.

And:

Some of those who want change envision a vague alternative in which the market would no longer be central to society; others, on the contrary, favor a minimalist state that would make laws and dispense justice, maintain order and conduct national defense, the minimum functions necessary to enforce contracts and property rights necessary for free enterprise. Neither of these two approaches help deliver the common good.

Tirole also has some fun in the book, including a short discussion of the economics of 'dwarf tossing'. The best parts of the book to me, though, were those parts at the end where Tirole has the most experience, and the most to offer. I made a number of notes that I will incorporate into my teaching of platform markets and natural monopoly. However, in general those sections were more technical than necessary. If they had been made a little more accessible to the average reader, this would have been a much better book.

Thursday, 13 April 2023

Who loses status, and who gains status, with ChatGPT

When there is some social or economic change, Tyler Cowen (of Marginal Revolution fame) likes to look at who will gain in status, and who will lose in status, as a result. I've been thinking a bit about this in relation to ChatGPT. Who really benefits, and who really loses? A lot of others have obviously been thinking about this as well, especially in relation to the labour market (as in my previous post). However, I want to discuss it in relation to a particular context - that of signalling.

First, we need to understand what a signal is, and why we provide signals. Signalling is a way of overcoming problems of adverse selection, which is a problem of asymmetric information. Think about each person's quality, as measured by some attribute (intelligence, perhaps). Each person knows how intelligent they are, but we don't know. This is asymmetric information. Since we don't know who is intelligent and who is not, it makes sense for us to assume that everyone has low intelligence. This is what economists call a pooling equilibrium. Pooling equilibriums create problems. For example, if you can't tell people who are intelligent apart from people who are not, you may treat everyone as if they have low intelligence. That won't be good for anyone.

How can someone reveal that they are intelligent? They could just tell us, "Hey, I'm smart". But, anyone can do that. Telling people you are intelligent is not an effective signal. To be effective, a signal needs to meet two conditions:

  1. It must be costly; and
  2. It must be costly in such a way that those with low quality attributes (in this case, those who are less intelligent) would not be willing to attempt the signal.

An effective signal provides a way for the uninformed party to sort people into those who have high quality attributes, and those who have low quality attributes. It creates what economists call a separating equilibrium

Ok, now let's come back to the context of ChatGPT. There are a lot of contexts in which writing well provides a signal of high quality for a variety of different attributes. Writing well is costly - it takes time and effort. Writing well is costly in a way that people with low quality attributes would not attempt, because they would be easily found out, or because it would take them a lot more time and effort to write well than people with high quality attributes. Now, because ChatGPT is available (along with Bing Chat and other LLMs), this reduces the costs of writing well, eliminating the signal value of writing well. That will lower the status of anyone who needs to write well in order to signal quality.

Now, let's consider some examples. Lecturers use essays to sort students into a grade distribution. Writing well is a signal of a student's quality, in terms of how well they have met the learning objectives for the paper. Students who write well get higher grades as a result. ChatGPT reduces the signalling value of writing well, meaning that an essay can no longer create a separating equilibrium for students. This is why I have argued that the traditional essay is now dead as an assessment tool. Smart students are likely to lose status as a result.

This can be extended to academic writing more generally. Academics get published in part as a result of the quality of their writing. Writing well is a signal of an academic's quality, in terms of the quality of their research. Academics who write well are more likely to get published. ChatGPT reduces the signalling value of writing well, meaning that good academic writing cannot be taken as a signal of the quality of the research. Good academics may lose status as a result.

There are lots of similar contexts, where the explanations are similar to those for students and academics. Think about journalists, authors, poets, law clerks, government policy analysts, or management consultants. Anyone who has ever read a policy document or a management consulting report will realise that the sort of meaningless banality you see in those documents and reports can easily be replaced by ChatGPT. The likes of McKinsey should be freaking out right now. ChatGPT is coming for their jobs. Good journalists, authors, poets, law clerks, government policy analysts, and management consultants will likely lose status.

There is one more context I want to highlight, which is a particular favourite of mine when teaching signalling to first-year students: online dating. It is difficult for a person to signal their quality as a potential date on a dating app. Anyone can write a good profile, and use a stock photo. However, one of the few signals that might be effective is the conversation on the app before a first date. A 'good' date should be able to set themselves apart from a 'not-so-good' date, by the things they say during a conversation. However, with ChatGPT in the picture, the signalling value of what people write in dating app conversations is reduced (in contrast to the assertions in this article in The Conversation). I wonder how long it will be before we end up in a situation where one instance of ChatGPT is talking to another instance of ChatGPT, because both dating app users are using ChatGPT at the same time (it has probably happened already). Anyway, good quality dates will lose status as well.

So, who actually gains status from the arrival of ChatGPT? That depends on what we do to replace the signals that ChatGPT has rendered useless. Perhaps we replace good writing as a signal with good in-person interactions. So, if lecturers use more oral assessments in place of essays, then smart students who are good talkers (as opposed to good writers) will gain status. Academics who are good presenters at conferences or in Ted Talks or similar formats will gain status. Podcasters (especially live podcasters, and other live performers) may gain status. The management consultants who present to clients may gain status (as opposed to those who do the writing). And so on.

What about online dating? It would be tempting to say that in-person meet-ups become more important as a screening tool, but this suggests that might not be effective either. If, as demonstrated in that tweet, anyone can have ChatGPT projected onto a pair of glasses in real time and then read from a prompt, then even those people who I suggested in the previous paragraph would gain in status, might not do so.

Or perhaps the quality of the underlying ideas becomes more important than simply good writing. The quality of the thinking still provides a good signal (at least, until ChatGPT becomes a lot more intelligent). That would help the top students, academics, journalists, authors, and poets to set themselves apart. However, it is much more difficult for the non-expert to judge the quality of the underlying ideas. It would be tempting to think that this raises the status of peer reviewers and critics. However, they can't easily signal their quality and are at high risk of losing status to ChatGPT. And if the expert judges can't be separated from the inexpert judges, then the quality of ideas can't be a good signal for non-experts to use. This is looking bleak.

Maybe lots of signals are about to be rendered ineffective? I feel like we should be more worried about this than anyone appears to be.

Tuesday, 11 April 2023

ChatGPT and the labour market

ChatGPT has created a lot of anxiety, and not just about what it means for existential risk (see here), or for students cheating on assignments (see here). A lot of concern has been raised about what the rise of large language models (LLMs) means for the future of work. That is, what jobs are likely to be affected, and in particular, what jobs are likely to be replaced by LLMs?

Some initial thoughts on this have been provided by this new working paper by Ali Zarifhonarvar (Indiana University Bloomington). Zarifhonarvar mines text data from the International Standard Occupation Classification (ISCO) to identify jobs where generative AI (such as ChatGPT and other LLMs) will:

not have any influence, full impact, or partial impact on various occupations. Both the full and partial impacts can have a negative or positive outcome.

Zarifhonarvar isn't very specific about the occupations, but groups them together at the highest category of ISCO occupations, and reports that:

...in the "Professionals" category, 95 occupations are estimated to have a full impact from ChatGPT, 22 have a partial impact, and 9 have no impact. Similarly, for the "Technicians and Associate Professionals" category, 60 occupations are estimated to have a full impact, 34 have a partial impact, and 16 have no impact. In the "Managers" category, 20 occupations are estimated to have a full impact, 21 are a partial impact, and 6 are estimated to have no impact.

In the "Clerical, Service, and Sales Workers" category, 8 occupations are estimated to have a full impact, 20 have a partial impact, and 14 have no impact. In the "Craft and Related Trades Workers" category, 8 occupations are estimated to have a full impact, 33 have a partial impact, and 45 have no impact. In the "Plant and Machine Operators and Assemblers" category, 5 occupations are estimated to have a full impact, 19 have a partial impact, and 34 have no impact.

...in the "Skilled Agricultural and Trades Workers" category, 4 occupations are estimated to have a full impact, 24 are estimated to have a partial impact, and 3 are estimated to have no impact. In the "Services and Sales Workers" category, 3 of the occupations are estimated to have a full impact, 37 have a partial impact, and 18 have no impact. In the "Armed Forces Occupations" category, all the occupations are estimated to have no impact from ChatGPT. In the "Elementary Occupations" category, 0 of the occupations are estimated to have a full impact, 16 are estimated to have a partial impact, and 35 are estimated to have no impact.

It will probably surprise no one that jobs for professionals, technicians and managers are most at risk from LLMs, and jobs in 'elementary occupations' (which includes sales, unskilled agriculture and fishery jobs, and labourers) will be the least affected. However, Zarifhonarvar's analysis is very crude and fairly speculative. We can expect some more thorough analyses, including the first studies using real-world data, to become available before long. Until then, sit tight and hope that the robots aren't coming for your job!

[HT: Les Oxley]

Monday, 10 April 2023

The game theory of an AI pause

My news feed has been dominated over the last week by arguments both for and against a pause on AI development, prompted by this open letter by the Future of Life Institute., which called for:

AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4.

Chas Hasbrouck has an excellent post summarising the various views on AI held by different groups (and examples of the people belonging to each group). Tyler Cowen then suggested on the Marginal Revolution blog that we should be considering the game theory of this situation (see also his column on Bloomberg - as Hasbrouck notes, Cowen is one of the 'Pragmatists'). I want to follow up Cowen's suggestion, and look at the game theory. However, things are complicated a little, because it isn't clear what the payoffs are in this game. There is so much uncertainty. So, in this post, I present three different scenarios, and work through the game theory of each of them. For simplicity, each game has two players (call them Country A and Country B), and each player has two strategies (pause development on AI, or speed ahead).

Scenario #1: AI Doom with any development

In this scenario, if either country speeds ahead and the other doesn't, the outcomes are bad, but if both countries speed ahead, the planet faces an extinction-level event (for humans, at the least). The payoffs for this scenario are shown in the table below.

To find the Nash equilibrium in this game, we use the 'best response method'. To do this, we track: for each player, for each strategy, what is the best response of the other player. Where both players are selecting a best response, they are doing the best they can, given the choice of the other player (this is the definition of Nash equilibrium). In this game, the best responses are:

  1. If Country B chooses to pause development, Country A's best response is to pause development (since a payoff of 0 is better than a payoff of -5) [we track the best responses with ticks, and not-best-responses with crosses; Note: I'm also tracking which payoffs I am comparing with numbers corresponding to the numbers in this list];
  2. If Country B chooses to speed ahead, Country A's best response is to pause development (since a payoff of -10 is better than extinction);
  3. If Country A chooses to pause development, Country B's best response is to pause development (since a payoff of 0 is better than a payoff of -5); and
  4. If Country A chooses to speed ahead, Country B's best response is to pause development (since a payoff of -10 is better than extinction).

In this scenario, both countries have a dominant strategy to pause development. Pausing development is always better for a country, no matter what the other country decides to do (pausing development is always the best response).

For anyone who believes in this scenario, pausing development will seem like a no-brainer, since it is a dominant strategy.

Scenario #2: AI Doom if everyone speeds ahead

In this scenario, if both countries speed ahead, the planet faces an extinction-level event (for humans, at the least). However, if only one country speeds ahead, then AI alignment can keep up, preventing the extinction-level event. The country that speeds ahead earns a big advantage. The payoffs for this scenario are shown in the table below.

Again, let's find the Nash equilibrium using the best response method. In this game, the best responses are:

  1. If Country B chooses to pause development, Country A's best response is to speed ahead (since a payoff of 10 is better than a payoff of 0);
  2. If Country B chooses to speed ahead, Country A's best response is to pause development (since a payoff of -2 is better than extinction);
  3. If Country A chooses to pause development, Country B's best response is to speed ahead (since a payoff of 10 is better than a payoff of 0); and
  4. If Country A chooses to speed ahead, Country B's best response is to pause development (since a payoff of -2 is better than extinction).

In this scenario, there is no dominant strategy. However, there are two Nash equilibriums, which occur when one country speeds ahead, and the other pauses development. Neither country will want to be the country that pauses, so both will be holding out hoping that the other country will pause. This is an example of the chicken game (which I have discussed here). If both countries speed ahead, hoping that the other country will pause, we will end up with an extinction-level event.

For anyone who believes in this scenario, pausing development will seem like a good option, even if only one country will pause development. However, no country is going to want to willingly buy into pausing development.

Scenario #3: AI Utopia

In this scenario, if both countries speed ahead, the planet reaches an AI utopia. The fears of an extinction-level event do not play out, and everyone is gloriously happy. However, if only one country speeds ahead, then the outcomes are good, but not as good as they would be if both countries sped ahead. Also, the country that speeds ahead earns a big advantage. The payoffs for this scenario are shown in the table below.

Again, let's find the Nash equilibrium using the best response method. In this game, the best responses are:

  1. If Country B chooses to pause development, Country A's best response is to speed ahead (since a payoff of 10 is better than a payoff of 0);
  2. If Country B chooses to speed ahead, Country A's best response is to speed ahead (since utopia is better than a payoff of -2);
  3. If Country A chooses to pause development, Country B's best response is to speed ahead (since a payoff of 10 is better than a payoff of 0); and
  4. If Country A chooses to speed ahead, Country B's best response is to speed ahead (since utopia is better than a payoff of -2).

In this scenario, both countries have a dominant strategy to speed ahead. Speeding ahead is always better for a country, no matter what the other country decides to do (speeding ahead is always the best response).

For anyone who believes in this scenario, speeding ahead will seem like a no-brainer, since it is a dominant strategy.

Which is the 'true' scenario? I have no idea. No one has any idea. We could ask ChatGPT, but I strongly suspect that ChatGPT will have no idea as well. [*] What the experts believe we should do depends on which of the scenarios they believe is likely to be playing out. Or perhaps, with a chance that any of the three scenarios (or any other of millions of other potential scenarios with different players and payoffs) is playing out, perhaps the precautionary principle should apply? The problem there, though, is if any country pauses development, the best response in any of the scenarios except the first one is for other countries to speed ahead. So, unless all countries can be convinced to apply the precautionary principle, pausing development is simply unlikely.

We live in interesting times.

*****

[*] Actually, I tried this, and ChatGPT refused to offer an opinion, instead it said: "...it is crucial that policymakers and stakeholders work together to develop standards and guidelines for responsible AI development and deployment to minimize potential risks and maximize benefits for society as a whole." Thanks ChatGPT.

Friday, 7 April 2023

Is Uber a substitute or complement for public transport?

Is Uber a substitute or complement for public transport? You could make arguments either way. On one hand, passengers could use Uber instead of public transport. So, if Uber becomes more available or relatively less expensive, some passengers might switch to Uber for their commuting or other journey - in that case, Uber and public transport would be substitutes. On the other hand, passengers could use Uber to solve the 'last mile' problem. The could take public transport for most of their journey, and then use Uber to 'fill in' the first or last part of their journey, which public transport cannot provide - in that case, Uber and public transport would be complements.

So, which is it? Substitutes or complements? That is the question addressed in this 2018 article by Jonathan Hall (University of Toronto), Craig Palsson (Utah State University), and Joseph Price (Brigham Young University), published in the Journal of Urban Economics (ungated version here). They use data from US Metropolitan Statistical Areas (MSAs) over the period from 2004 to 2015, and apply a difference-in-differences approach. That essentially involves comparing MSAs with and without Uber, before and after Uber was introduced to each MSA. In addition to using a straightforward binary variable to capture Uber's presence (or not), they also use a measure of the intensity of Uber's penetration into each MSA market, based on the proportion of Google searches for "Uber".

While also controlling in their analysis for total employment and unemployment rates, population, and gas prices, Hall et al. find that:

...when Uber arrives in an MSA, transit ridership does not change much, with a coefficient that indicates there is a 0.26% increase in public transit use which is not statistically significant. However... as Uber becomes more commonly used in the MSA, there is an increase in public transit use, with a standard deviation increase in Uber penetration increasing public transit ridership by 1.4%.

In other words, Uber and public transport are complements. Hall et al. suggest that:

One reason Uber is a complement rather than a substitute for the average transit agency may be that transit is still much cheaper to use. The median minimum Uber fare is $5, while transit fares average just $1. Undiscounted fares for bus or light rail are never above $3, and for those with a monthly pass the marginal fare is zero. Transit is cheaper by enough that Uber’s role in adding flexibility to the transit system is more important than its ability to substitute for riding transit.

So, Uber is simply too expensive to be a substitute for public transport for most passengers. Hall et al. then extend their analysis, and find that:

Uber most strongly complements small transit agencies in large cities. This is likely because a small transit agency in a large city provides the least flexible service in terms of when and where they travel, and so Uber’s ability to add flexibility for such agencies is valuable to riders... In addition, transit riders in larger cities tend to be wealthier, and so there is greater overlap between those who ride transit and can afford to take Uber...

Finally, Hall et al. look at the effect of Uber on commuting times, using data from the American Community Survey. If passengers are using Uber for the 'last mile' portion of their journey, that may reduce their commuting time, but increase traffic and the commuting times for others. Hall et al. find that, as expected:

For public transportation users, the coefficients are large and negative, but the results are not statistically significant, and commute times for private vehicle commuters in large MSAs or those with a small transit agency increased by 1.5–2.5%. Together these results suggest that Uber reduced commute times for public transit users while increasing congestion.

So, in most cases, and especially in large cities with small transit agencies, Uber is a complement for public transport. However, adding Uber to a city is not without cost. Commuters who are not using public transport likely face more traffic congestion as a result of Uber (as I've noted before).

Read more:

Thursday, 6 April 2023

Tim Harford on the network effects, switching costs, and the enshittification of apps

This week, my ECONS101 class has been covering pricing and business strategy. As we discuss in class, pricing strategy is essentially about creating and capturing value - the firm creates value for consumers (or other businesses), and then finds creative ways to capture that value back from the consumers (or businesses) as profits. So, it was timely to read this post by Tim Harford this week, looking at apps:

The writer and activist Cory Doctorow has coined a memorable term for this tendency for platforms to fall apart: enshittification. “Here is how platforms die,” he wrote in January. “First, they are good to their users; then they abuse their users to make things better for their business customers; finally, they abuse those business customers to claw back all the value for themselves.”...

Nevertheless, I’m quite sure enshittification is real. The basic idea was sketched out in economic literature in the 1980s, before the world wide web existed. Economic theorists lack Doctorow’s gift for a potent neologism, but they certainly understand how to make a formal model of a product going to the dogs.

There are two interrelated issues at play. The first is that internet platforms exhibit network effects: people use Facebook because their friends use Facebook; sellers use Amazon because it’s where the buyers are, while buyers use Amazon because it’s where the sellers are.

Second, people using these platforms experience switching costs if they wish to move from one to another. In the case of Twitter, the switching cost is the hassle of rebuilding your social graph using an alternative such as Mastodon, even if all the same people use it. In the case of Amazon, the switching cost includes saying goodbye to your digitally locked eBooks and audiobooks if you move over to a different provider. Doctorow is fascinated by the way these switching costs can be weaponised. His short story, Unauthorized Bread, describes a proprietorial toaster that only accepts bread from authorised bakers.

Both switching costs and network effects tend to lead to enshittification because platform providers see early adopters as an investment in future profits. Platforms run at a loss for years, subsidising consumers — and sometimes suppliers — in an effort to grow as quickly as possible. When switching costs are at play, the logic is that companies attract customers who they can later exploit. When network effects apply, companies are trying to attract customers because they will draw in others to be exploited. Either way, exploitation is the goal, and the profit-maximising playbook will recommend bargains followed by rip-offs.

All of this comes back to creating and capturing value. First, the firm uses its shiny new app to create value for consumers. The app can create a lot of value if the consumers can access it for free. That sucks the consumers into the network. A large network then creates value for advertisers, because it represents a large audience for their advertising. Finally, the firm can capture that value back from the advertisers in the form of profits. Although, as Harford's post notes, the process of capturing value back from advertisers reduces the value that consumers get from the app. However, since those consumers face switching costs to change to some other app, they are locked in to using the app. So, the firm is probably fairly unconcerned about the consumers' loss of value, so long as they continue to use the app (and create value for advertisers).

Just because pricing strategy is about creating and capturing value, that doesn't necessarily mean that firms are focused on creating value for consumers, if they can be more profitable by creating value for someone else, in this case advertisers.