Saturday, 29 February 2020

Lecture recording and lecture attendance

Should lectures be routinely recorded and made available to students? This is a question that frequently comes up at staff meetings, education or academic committees. Arguments in favour of lecture recording include that it increases flexibility for students, helps with revision, reduces the negative impacts of timetable clashes, allows students to manage their non-study commitments (work, family, etc.), and improves equity for students with disabilities or those studying in a second language. Arguments against lecture recording include negative impacts on class attendance, that students relying on recorded lectures miss participating in important in-class activities and discussions, and (occasionally) that guest lecturers (or the lecturers themselves) do not want their recorded images available online.

I want to focus this post on the impact of recordings on class attendance, prompted by this article in The Conversation last week, by Natalie Skead (University of Western Australia) and co-authors. :
We conducted a large-scale study in our law school to uncover whether lecture recordings are responsible for declining student attendance and what motivates students to attend or miss class.
By manually counting how many students were in lectures across sixteen different subjects, we found attendance rates averaged just 38% of total enrolments across the semester.
There was a natural ebb and flow of lecture attendance throughout the semester. There was peak attendance at the beginning (57%), a significant drop in the middle as assessments became due (26%) and a rebound at the end of semester as exam season hit (35%).
Attendance at 38% is horrifically low. When asked about reasons for non-attendance, Skead et al. note that:
Availability of lecture recordings was the most common reason students gave for not attending lectures (18% of students said this). But work commitments were a close second (16%). Then it was timetable conflicts (12%), the time and day of lectures (11%) and assessments being due (8%).
So, lecture recording is just one factor among many that affects lecture attendance. In my experience (both as a student and as a lecturer) the biggest impact on attendance is value added. If a lecture consists mainly of reading from pre-prepared PowerPoint slides (especially if they are the default textbook slides), then students will rightly question whether attendance is worth it.

I have recorded all of my lectures in both first-year economics papers for at least the last ten years. Attendance at my lectures doesn't appear to be negatively affected by recording. There was no noticeable drop-off in attendance when I started recording, and when I look at the server logs, it tends to be the students who are attending class who are watching the recordings (for revision), rather than the students who are not attending. I guess it is possible that students are not attending and trying to convince themselves that they will watch the recordings, then failing to do so. However, there isn't any data to support that assertion.

So, why would recording my lectures have little (if any) effect on attendance? I try to make attendance at my lectures worthwhile by punctuating the lectures with exercises - opportunities to practice the material immediately. And while students are completing the exercises, I circulate the room talking to them and providing directed assistance. This interactive teaching approach gives me important feedback on what the students are not understanding, but also helps develop students' learning. This is one aspect of my teaching practice that students most agree is helpful to their learning, and if they're not attending class they wouldn't be exposed to it.

Unfortunately, there are trade-offs with any teaching approach, including interactive teaching. The interactive teaching approach doesn't work quite so well for recorded lectures, unless students are willing to pause recordings and work on the problem before proceeding. In my experience, the majority of students who are watching recorded lectures don't do this (again, the server logs show this - in fact, many students watch only portions of the lectures). This makes attendance in my lectures the number one contributor to students' grades. I don't keep this a secret - on the first day of class, I tell my students how important attendance is going to be, and show them the data (from the previous year) to support it. And when a student appeals their failing grade at the end of a semester, invariably if I look, I will see that their attendance was low.

Encouraging attendance is therefore important. A rational student is going to weigh up the costs and benefits of attending class. Given particular opportunity costs of attending, attendance can be increased if the benefits are increased. I began offering extra credit in my ECON110 (now ECONS102) class in 2012, and extended it to ECON100 (now ECONS101) in 2016 (see this post for more details). So, students receive both an extrinsic reward (extra credit) and an intrinsic reward (better learning) by attending class.

The overall effect is that attendance remains fairly high in ECONS101 and ECONS102 throughout the semester. Of course, as with all papers, attendance in my class does drop off over time, but in ECONS101 last A semester we had attendance of 58% in the second-to-last week (that was the lowest attendance for the entire semester).

So, it is possible to maintain high attendance in spite of routinely recording lectures. Lecturers simply need to make it worth students' while to attend.

Wednesday, 26 February 2020

Why having a safer car could make your car insurance more expensive

When a car owner buys car insurance, they are looking to shift some of the cost of an accident onto the insurer. In exchange, the car owner agrees to pay an annual (or monthly) premium to the insurer.

Is selling insurance worth it to the car insurer? As with any decision, it depends on weighing up the costs and benefits. If the costs of offering insurance are greater than the benefits, then selling insurance would make the car insurer worse off (they would make a loss), and so they shouldn't sell car insurance. If the benefits are greater than the costs, then selling car insurance makes the car insurer better off (they would make a profit), and so they should sell car insurance.

How does the car insurer decide on the insurance premium? Obviously, they need to know how much it will cost them, and then set the premium to be higher than the cost (to ensure that they make a profit). Let's simplify things and say that all cars are the same, and all have the same chance of being involved in an accident. In simple terms, the cost of selling insurance to a car owner is the repair and other costs that the insurer would have to pay in the case of an accident, multiplied by the probability of there being an accident. For example, if there is a 5% chance on average of there being an accident in any year, and the accident would cost $20,000 in repairs and other costs, then the cost of providing insurance would be $1,000 ($20,000 x 5%) per year. In order to make a profit, the insurer would have to charge an insurance premium of more than $1,000 per year.

So, what happens to the car insurance premium if cars get safer? The probability of an accident probably goes down, but the cost of repairs goes way up. So, does the cost of providing car insurance go down (less chance of having to pay the repair costs), or up (when there is a repair, the cost of the repair is much higher)? We can't tell - the effect on the cost of providing insurance is ambiguous, because we can't tell if the cost goes up or down.

However, as this recent article in Wired reports:
American car insurance rates are going up up up. In the past decade, they climbed 29.6 percent, to an average of $1,548 in 2019 from $1,194 in 2011. The surge, detailed in a new report from insurance shopping site The Zebra, outpaced both inflation (by far) and the increase in average car prices (more narrowly). And it came even as the rate of crashes has fallen year over year.
Aggrieved drivers have plenty of directions to point their fingers. Vehicle theft is on the rise, and extreme weather fueled by climate change can destroy swaths of vehicles in short order...
A more surprising, counterintuitive culprit isn’t the wider world or the person behind the wheel but the car itself. It turns out that new features designed to keep vehicles in their lanes and out of trouble are contributing to rising insurance rates.
That’s because the sensors that power those systems make cars much more expensive to fix when they do crash. Dent a steel bumper, and a few hammer blows gets you back on the road. Smash one on a new car, and it could mean replacing a radar, a camera, and ultrasonic sensors, then calibrating them so they work properly. Replacing a cracked windshield now comes with the extra cost of having someone readjust any cameras that look through the glass. “Technology is playing a bigger role than ever in pricing,” says Nicole Beck, The Zebra’s communications chief. “It’s not actually making it cheaper for people.”
While some studies have shown the effectiveness of emergency braking, insurance companies haven’t yet seen enough evidence to justify a break in rates for most of these features. That’s not to say lane keeping, parking assist, and the rest don’t work. They’re all relatively new, and the actuaries aren’t yet confident that their benefits outweigh the extra costs they incur to repair.
So, these new safety features do reduce accident rates, but because they cost so much more to repair, the cost of providing insurance is increasing. So, insurers are passing the cost onto consumers, in the form of higher insurance premiums. However, this increase in insurance premiums might not last. As the article notes:
The good news for car owners is that the steep upward trend in rates may not last. More data showing the upsides of driver assistance tech may accrue. Repairs should get cheaper as more mechanics learn to replace and calibrate sensors and as the prices of those parts drop. The mystery lies in figuring out how long those trends will take to make their effects felt. “Knowing that it’ll happen eventually is pretty easy,” Carges [the chief actuary at Root Insurance] says. “Knowing exactly when the inflection point is, is not.”
When the costs of repairs come down, the cost of providing insurance comes down, and insurers will start to charge lower premiums for car insurance.

Sunday, 23 February 2020

Book review: Reinventing Capitalism in the Age of Big Data

One of the books I recommended for the Prime Minister's summer reading list was Reinventing Capitalism in the Age of Big Data, by Viktor Mayer-Schönberger and Thomas Ramge. At this time, I thought it would be good for understanding the current period of creative destruction. However, I'm not so sure now.

The premise of the book is to promote the idea of 'data-rich markets', which rely on three elements: (1) Improvements in data ontology (the way we organise data); (2) Advances in matching algorithms; and (3) Machine learning systems to observe us and identify our preferences. To many of us, that probably sounds dystopian and maybe a little tone deaf given the current debates over data and privacy.

Mayer-Schönberger and Ramge are well-intentioned - they suggest that data rich markets will increase efficiency. Which is probably true, but does not come without cost. In my ECONS101 class, we talk about perfect price discrimination, where every consumer pays a different price. If executed perfectly, with full knowledge of what consumers are willing to pay, every consumer will pay the maximum that they are willing to. Ironically, that approach maximises economic welfare - it is perfectly efficient. However, it involves a substantial transfer of welfare from consumers to producers, and whether that is ideal may be debatable.

Mayer-Schönberger and Ramge argue that we make decisions on the basis only of price. They note that:
Price greatly reduces the amount of information that needs to flow through the market; the information is compressed into a single figure for which traditional communications channels are sufficient.
However, to make such a claim is to ignore the fact that, from the consumer's perspective, price is only one element of the decision. Again, thinking about ECONS101, our model of consumer choices demonstrates that a consumer's choice to purchase depends also on the price of all other goods, the consumer's income, and most importantly of all, their preferences. The consumer's preferences are not collapsed into a price. The price may collapse the information for the other side of the market (i.e. for sellers), but not for buyers.

This isn't the only place where Mayer-Schönberger and Ramge betray a misunderstanding of key economic principles. They also have an idea that, in the future, money will become irrelevant, and that we will transact using data. For example, they argue that corporations might pay their taxes in data. However, money has several functions, of which being a medium of exchange (e.g. for paying taxes) is just one. Importantly, money is a unit of account - you can measure the value of things with it. Data can't easily replace money's function as a unit of account. To use their own example, how would the taxes that a corporation needs to pay be measured? No doubt in dollars, i.e. money as a unit of account.

There are some interesting examples in the book, but sometimes they haven't been thought through well enough. For instance, shortly after claiming that banks' business models are doomed to failure (my words, not theirs), they present the example of Robinhood Markets, which allows people to trade stocks on US exchanges with zero commission. They can afford to do that because they "depend on interest generated from deposited but not yet invested funds". But wait? If banks are doomed to failure, then where is this business going to get its interest income from in the future? Isn't it doomed to failure as well? Similarly, based on my experiences, I'm not convinced that Amazon's product recommendation engine is an exemplar for anything (e.g. when will it stop recommending books that I have already bought... from Amazon?).

When I bought this book, I thought it was going to challenge the current market model and present a potential alternative. It does that, but in the opposite direction of what I expected! I'm not convinced that we need more freedom for markets. The book fails to engage with some of the obvious critiques that it will engender. First, how does would data-rich markets prevent or mitigate a new Global Financial Crisis? It seems to me that, if anything, the risks would be greatly increased. Second, although the authors argue the opposite, won't data-rich markets simply embed decision-making biases further into decisions through their extensive use of algorithms? Cathy O'Neil's book Weapons of Math Destruction (which I reviewed here) seems very relevant.

In short, I'm kind of glad this book didn't make it onto the Prime Minister's summer reading list. I recommend that you avoid it too.

Saturday, 22 February 2020

The origins and consequences of the Mexican drug cartels

A new article published in the Journal of Development Economics (sorry I don't see an ungated version online), by Tommy Murphy and Martin Rossi (both Universidad de San Andres) tracks the history and consequences of drug cartels in Mexico. There are several surprising bits in this article, especially for those of us with less knowledge of Mexican history. First, on the origins and geographical distribution of the cartels:
The distribution of cartel activities in Mexico is, of course, the result of many different factors, some which are better understood than others. Here we document the particular claim made by some authors... that one of these factors is the Chinese immigration to Mexico at the turn of the 19th century, and provide evidence that its influence seems to persists until today. A series of events justifies this connection. Drug prohibition (mainly in the U.S.) created the market that illicit organizations eventually filled. Yet the time in which this took place (the 1910s) made Chinese migration relevant, particularly the one that settled in Mexico around the turn of the 20th century. During the 19th century many Chinese emigrated and sought refuge in the Americas. For the most part, this flow directed towards the U.S., but in the early 1880s the U.S. introduced restrictions on immigration aimed at Chinese people, many of which end up settling in Mexico. This event is important to understand the onset of drug trade in the region, as there are good reasons to believe the Chinese had a comparative advantage in that trade. One of them is that, outside alcohol and tobacco, the main ‘recreational’ drug consumed at the time was opium... But, along with an advantage in the production of a good whose market remained largely unregulated until the 1920s, Chinese arguably also had developed an advantage on the distribution of illegal goods across the border. With the restriction on Chinese immigration by the U.S., many Chinese south of the border began to gather specialized knowledge on an activity that will prove useful with the introduction of drugs prohibition: smuggling Chinese into the U.S...
So, the areas where Chinese migrants settled were more likely to be areas where opium was cultivated, because Chinese migrants brought opium seeds with them. Chinese migrants had a comparative advantage in producing opium, and then developed a comparative advantage in smuggling into the U.S. as well. This knowledge would eventually filter to the locals in the areas where the Chinese were located, who wanted to take over the business. Murphy and Rossi note that:
...part of the well-recorded sinophobia that eventually lead to the expulsion of most Chinese from Mexico was influenced by criminals wanting to gain control of this lucrative business.
So, after the Chinese were expelled, the Mexicans had control of the opium growing and smuggling operations. Importantly, Murphy and Rossi then go on to show that:
...places where more Chinese migrated at the turn of the 20th century, nowadays are more likely to show cartel activity.
Specifically, they find that municipalities where the Chinese were present in 1930 are 12.8 percentage points more likely to have cartel activity in the 21st Century, than municipalities without Chinese present in 1930. That's after controlling for a lot of population and geographical variables, including the presence of German migrants (which would pick up any areas that are on average more attractive to migrants), and distance to the U.S. border. Drug cultivation and supply are reasonably persistent activities, particularly when they are highly profitable.

Murphy and Rossi then use their initial results (Chinese presence in 1930 is associated with cartel activity in more modern times) to investigate the socio-economic effects of cartel activity (in an instrumental variables analysis). They find that:
...cartel presence strongly associated with good socioeconomic outcomes, such as lower marginalization rates, lower illiteracy rates, higher salaries, and better public services. We also report that cartel presence is associated with higher tax revenues and more political competition, in line with what is reported in Mexican literature.
This result is a surprise, since we typically think about the cartels as being overwhelmingly negative. However, Murphy and Rossi explain that:
...the counterfactual are Mexican municipalities without cartel presence. It is then entirely possible that all municipalities in Mexico are worse off compared to a situation without cartels, even if within Mexico those with cartels are doing relatively better.
Everything is relative, including the impact of drug cartels it seems. They also argue that:
...since their main business is not –in principle– intrinsically based upon violence (as mafia), but on producing and distributing a tradable good, they have local positive effect in the areas where their activities are concentrated.
This point is a little bit harder to believe. We know from lots of media reports that there is extreme violence associated with cartel activity. Perhaps areas that have more cartel activity have more government resources devoted to them? Perhaps the data are subject to some selection bias in those areas? I think we need more investigation on this before we can conclude that cartel activity is positive.

[HT: Marginal Revolution]

Read more:

Wednesday, 19 February 2020

Gender biases in student evaluations of teaching

Teaching is about to start again for universities in New Zealand. Classes at Waikato start on 2 March. So it seems like an opportune time to talk about how we measure teaching quality, and in particular the ways that measurement is going wrong. The standard approach to measuring teaching quality is to ask students to complete an evaluation, often at the end of a paper or course. Those student evaluations of teaching (SETs) usually involve rating the teacher, and the paper, on some scale, and across one or more criteria. The scores are then combined (there are various ways to do this) to give an overall measure of teaching quality.

There is some ideological support for asking students to rate teaching quality. If you view education as a consumption activity, then students are consumers, and the service provider (the university) wants to know about the experience of their customers. However, the theoretical support for this position is shaky. Education is not a consumption activity - it is a production activity. Education produces human capital, as well as a signal of quality to future employers. At the time that students complete a particular paper, they are in no position to evaluate the quality of that production, because they are not yet making use of it. It would be like asking a car buyer to rate the quality of spark plugs in their vehicle, before they've even had a chance to take it for a drive.

Students are not in a strong position to rate the quality of their education, perhaps until years after that education is complete. And I say this as someone who routinely gets outstanding teaching evaluations (and has the multiple teaching awards over the last decade to add substance to that claim).

You may doubt me, but research on SETs backs me up. If students were good at evaluating teaching, then we wouldn't expect to see systematic biases appear in teaching evaluations. So, if SETs routinely rate female lecturers worse than male lecturers, you have the choice of either arguing that it results from female lecturers generally being worse (on average) than male lecturers, or that SETs are biased. And if SETs are biased (which seems like the more valid claim), then it provides evidence that SETs are not a good measure of teaching quality.

There's lots of evidence for gender bias in SETs. I've read several papers that attest to this, just in the last few years, and I'll outline some of them below.

Let's start with this 2016 article, by Natascha Wagner, Matthias Rieger, and Katherine Voorvelt (all Erasmus University Rotterdam), published in the journal Economics of Education Review (ungated earlier version here). They use data from MA students enrolled at the International Institute of Social Studies at Erasmus University from 2010/11 to 2014/15, which included 688 teaching evaluations across 272 courses. Interestingly, the response rate to the teaching evaluations was 87%, much higher than many other institutions achieve. They find:
...significantly lower scores in teaching evaluations for women compared to men, but only once we control for course unobservables. In other words, the documented associations insinuate that teacher evaluations are not gender blind, and gender effects explain roughly one fourth of the sample standard deviation in SETs.
Female lecturers receive teaching evaluation scores that are 0.25 standard deviations lower than those of male lecturers, after controlling for the characteristics of different courses. They also find that:
Women obtain considerably lower teacher evaluations when teaching with men compared to teaching alone or with other women.
When students have the opportunity to compare male and female lecturers within the same course, they give (on average) better teaching evaluation scores to the male lecturer. Finally, this bit was also interesting:
Interestingly, we find that the negative female teacher effect is reversed in the major for gender studies and social justice.
In gender studies and social justice, male lecturers received worse teaching evaluations. That might have something to do with the difference in gender composition of the students, but without student-level data, we would never know.

Moving on, this 2017 article by Anne Boring (Sciences Po), published in the Journal of Public Economics (ungated earlier version here) finds similar results, based on student-level teaching evaluation data for an unnamed university over the period from 2008/09 to 2013/14, which includes over 20,000 observations. That's right - Boring knows the individual evaluations that students gave (rather than the average overall rating), so can control for both teacher effects as well as student effects (so it a student routinely gives high, or low, ratings, that can be accounted for). She has data for six mandatory courses, where students are unable to select their teacher (and therefore, can't sort themselves into a section taught by a teacher of their preferred gender). She finds that:
...male students give significantly higher overall satisfaction scores to male professors than to female professors. Male students also rate male professors significantly higher than how female students rate both female and male professors... a male professor being rated by a male student is approximately 11 percentage points more likely to be rated as excellent compared to how he would be rated by a female student. As a result, a male professor’s expected excellent overall satisfaction score is approximately 20% higher than a female professor’s expected excellent overall satisfaction score. I also find that students perform equally well on final exams whether their professor was a man or a woman, suggesting no difference in actual teaching effectiveness. Thus, the results suggest that differences in teaching skills are not driving gender differences in evaluations.
Unlike male teachers, female teachers tend to receive similar scores from both male and female students. Notice that teaching effectiveness (as measured by exam performance) doesn't depend on gender of their teacher (which is a point that Alex Tabarrok made couple of times last year on the Marginal Revolution blog, see here and here). Digging a little deeper, Boring finds that:
...male and female students tend to give more favorable ratings to male professors on teaching dimensions that are associated with male stereotypes (of authoritativeness and knowledgeability), such as class leadership skills and the professor’s ability to contribute to students’ intellectual development. I find that, on average, students rate female professors similarly to male professors for teaching skills that are more closely associated with female stereotypes (of being warm and nurturing), such as preparation and organization of classes, quality of instructional materials, clarity of the assessment criteria, usefulness of feedback on assignments, and ability to encourage group work.
Gender stereotypes seem to matter. This 2019 article by Whitney Buser (Young Harris College), Jill Hayter (East Tennessee State University), and Emily Marshall (Dickinson College), published in the American Economic Review Papers and Proceedings issue (open access), uses student-level data from several unnamed universities, based on surveys conducted at three points during the semester. It's not entirely clear when the first survey was (perhaps on the second day of class?), but the other two surveys were collected on the day that the students' first exam was returned, and on the day of the final exam. Buser et al. have over 2200 survey responses in their sample, and they find that:
...statistically significant lower ratings of female professors at the beginning of the semester and after the first exam is returned. While ratings of male instructors also improve over the semester, female instructors have significantly lower ratings at the beginning of the semester and after the first exam grade is returned before eventually converging close to the ratings of male instructors.
So, this at least suggests that students' biases might reduce after more exposure to female lecturers, maybe? However, that doesn't explain the persistent end-of-course bias found in other studies though.

A very similar study to Anne Boring's was reported in this 2018 article by Friederike Mengel (University of Essex), Jan Sauermann (Stockholm University), and Ulf Zölitz (University of Zurich), published in the Journal of the European Economic Association (ungated earlier version here). They use nearly 20,000 observations of student-level evaluation data from Maastricht University over the period 2009/10 to 2012/13, and again in a setting where students are randomly assigned to a section and a teacher. Their sample includes evaluations for some 735 lecturers. They find that:
...female faculty receive systematically lower teaching evaluations than their male colleagues despite the fact that neither students’ current or future grades nor their study hours are affected by the gender of the instructor. The lower teaching evaluations of female faculty stem mostly from male students, who evaluate their female instructors 21% of a standard deviation worse than their male instructors. Female students were found to rate female instructors about 8% of a standard deviation lower than male instructors.
Notice that the size of the bias is strikingly similar to that reported in Wagner et al. Mengel et al. also find two other interesting results:
When testing whether results differ by seniority, we find the effects to be driven by junior instructors, particularly Ph.D. students, who receive 28% of a standard deviation lower teaching evaluations than their male colleagues. Interestingly, we do not observe this gender bias for more senior female instructors like lecturers or professors. We do find, however, that the gender bias is substantially larger for courses with math-related content...
The gender bias against women is not only present in evaluation questions relating to the individual instructor, but also when students are asked to evaluate learning materials, such as text books, research articles, and the online learning platform. Strikingly, despite the fact that learning materials are identical for all students within a course and are independent of the gender of the section instructor, male students evaluate these worse when their instructor is female.
If you still haven't bought into the conclusion that SETs are seriously biased, the second result (gender bias spills over into how textbooks are evaluated, even when students have the same textbook regardless of the gender of their teacher) probably should be giving you pause.

Finally, you might wonder whether these results are somehow unique to universities in high income countries. It turns out that isn't the case, as this 2019 article by Carolyn Chisadza, Nicky Nicholls, and Eleni Yitbarek (university of Pretoria), published in the journal Economics Letters (sorry, I don't see an ungated version of this one online), shows. Chisadza et al. asked 1599 first-year economics students to watch a 12-minute video, and then complete a quiz and a SET evaluation. Students were randomised as to the gender and race of the presenter on the video, but otherwise the script and the slides were the same. They found that:
...students give higher ratings to female and white lecturers. These differences are most pronounced for female and white students.
It's interesting that they find an effect in the opposite direction to the other studies I highlighted earlier in the post. However, this study also isn't quite as convincing as those other studies, because it's limited to a small number of students in a single course. It does at least show that biases in SETs are probably not limited to universities in high-income countries (it would be interesting to see more studies of bias in SETs using data from universities in developing and middle-income countries, though).

All up, I think it is fairly safe to conclude that SETs are systematically biased, and those biases probably arise from stereotypes. The biases are also seriously consequential for faculty. Teaching evaluations are used in hiring and promotion decisions, and if they are systematically biased against particular groups, then those groups will be disadvantaged in their careers.

We need to re-think the measurement of teaching quality. Students are not consumers, and so we can't evaluate teaching the same way we would evaluate a transaction at a fast food restaurant, by surveying the 'customers'. There are alternatives to SETs that universities should make more use of, including teaching portfolios (where teachers have an opportunity to articulate their teaching approach and support it with evidence), and peer evaluations (which are used much more extensively at primary and secondary schools, for instance). Of course, these alternatives are neither as simple, nor as low-cost, as SETs. However, if you want an evaluation done right, sometimes you have to pay the full cost of conducting that evaluation.

Tuesday, 18 February 2020

Online social networks, social capital, and wellbeing

In economics, capital is essentially defined as a collection of resources that an individual uses to produce goods or services (even if those goods or services are not traded in markets). Some types of capital are obvious, such as machinery or tools (physical capital), or financial wealth (financial capital). Others are a little less obvious, like the stock of human capital (our knowledge, training, and experience, etc.) and natural capital (land, air, water, biodiversity, etc.). Then there is social capital - the social relationships that we have with other people. Social capital is the most difficult to measure (even more difficult than natural capital), but typically we measure it in terms of the number of relationships, and the quality of those relationships, and in terms of quality, we often use the degree of social trust as one measure.

All of that is a long-winded way of talking about the value of online social networks (which is a topic I have discussed before - see the links at the end of this post). In theory, social networks could increase social capital, because they allow us to increase the number of social relationships. However, social networks could also decrease social capital if, in spite of a larger number of relationships, the quality of those relationships is lower. If a social network has a particularly bad culture, the quality might even be negative (if belonging to the network makes us worse off, holding the number of relationships constant).

How can we understand whether social networks have a net benefit (the increase in relationships outweighs any decrease in their quality) or a net cost (the decrease in quality outweighs the increase in relationships)? One way is to look at subjective wellbeing (or life satisfaction, or happiness), and that's what a lot of studies have done (see the links at the end of the post for a few examples). However, fewer studies have also looked at social capital.

One notable exception is this 2017 article by Fabio Sabatini (Sapienza University of Rome) and Francesco Sarracino (STATEC, Luxembourg), published in the journal Kyklos (appears to be open access, but just in case there is an ungated earlier version here). They use data on around 50,000 people from the 2010-2012 waves of the Italian Multipurpose Household Survey. Subjective wellbeing was measured on a 0-10 scale (which is quite common), but online social network use was measured by the yes/no response to the question: "Did you make use of social networking sites such as Facebook and Twitter in the last 12 months?". Social capital was measured by the number of interactions with friends (quantity), and the dichotomous response to the question: "Do you think that most people can be trusted, or that you can’t be too careful in dealing with people?" (quality).

The problem with most studies of online social networks is that they can't show that using the social network causes a change in subjective wellbeing, because perhaps happier (or less happy) people are more likely to use the online social network, in which case the causality runs in the wrong direction. Sabatini and Sarracino try to get around this by using instrumental variables analysis - essentially they predict people's social network access by looking at whether the area they lived in had access to high-speed broadband (DSL or fibre) or not in 2008 (i.e. two years before the survey), then see if the predicted social network access is related to subjective wellbeing. There are issues with these instruments, which I will come back to shortly. Sabatini and Sarracinofind that there is:
...a significant and negative correlation between the use of SNS [social networking sites] and subjective well-being which is independent from the controls for social capital.
In their instrumental variables analysis, they find that:
...the proxies of social capital are positively and significantly associated with life satisfaction, while the use of SNS has negative and significant coefficients.
Finally, they use structural equation modelling (SEM) to look at the inter-relationships between online social network use, social capital, and subjective wellbeing. They find that:
The SEM analysis suggests that the significantly negative correlation between SNS use and subjective well-being obtained in OLS estimates is not only the result of a direct negative effect, but it also results from the combination of two indirect channels:
1. the negative correlation between the use of SNS and social trust that negatively affects well-being.
2. the positive correlation between the use of SNS and face-to-face interactions that positively affects well-being.
Taken all together, their results show that online social networks reduce subjective wellbeing, but that is because online social network use is associated with lower quality of social capital (lower trust), even though online social network use is associated with greater number of social relationships.

This is a nice paper, because of the combination of several methods of analysis. However, the results aren't as strong as the authors claim. The instruments (DSL and fibre broadband access) are not good instruments, because high-speed internet is not necessary in order to access social networks, and because high-speed internet is also associated with increases in the use of many other internet tools (online video, for example). However, despite that, the results are at least consistent with our theoretical predictions from the start of this post. Of most interest may be that the use of online social networks was associated with more face-to-face interactions, rather than just more online interactions.

More papers in this research area should take account of social capital though, if we really want to understand how online social networks affect subjective wellbeing.

Read more:

Sunday, 16 February 2020

The optimal class time for student learning

Last year in B Semester, one of our ECONS101 lectures was scheduled for 5pm. Les and I both thought that class attendance would be really poor for such a late lecture time. It was lower, but not nearly as low as we expected (having incentives to attend class, in the form of extra credit, probably helped). In the A Semester, we have a 9am class, and it seems that attendance for that one is not much greater. Which raises a question: when is the best time to have class, if we want to maximise student learning?

There's at least a couple of aspects to this. First, students may attend at different rates at different times. Early classes may have low attendance if students are late risers, while late classes may have low attendance if students have evening jobs, sports practices, and so on. Second, students might be able to focus and learn better at different times of the day, such as the afternoon, which is why some schools have experimented with starting the school day later. However, university study is different, and students typically have at least some choice over class times (or whether to go to class).

So, it was interesting to read this 2017 article by Timothy Diette (Washington and Lee University) and Manu Raghav (DePauw University), published in the journal Applied Economics (open access). Diette and Raghav used data from "a private highly selective liberal arts college" over the period from 1999/2000 to 2007/08, to investigate whether class times affect student grades. Using data across all year levels, they have over 115,000 observations. They find that, after controlling for student gender, SAT scores (as a measure of ability), class size, level, experience, and instructor and department fixed effects, that grades are lower in the mornings, and higher in the afternoons. The figure below summarises the results from their preferred specification:

All class times before 1pm are associated with lower grades than 1pm classes, and all classes after 1pm (with the exception of 3pm, which is not statistically significantly different) are associated with higher grades than 1pm classes.

Interestingly, when looking at the results for each gender separately, they find that:
Both genders earn lower grades in morning classes and higher grades in afternoon classes... we find that the magnitude of the penalty of 8 am and 9 am classes relative to 1 pm classes for male students is almost double the estimated effect on female students. In addition, male students have a larger benefit from late afternoon classes relative to female students.
The results are not strictly causal estimates, although in their sample, students were randomly allocated to class times (they don't say whether students could subsequently switch classes). This has obvious implications for students. If a student has a chance to take a class that is later in the day, they should do so. It also tells me that I should be less worried about the 5pm lecture times, and more worried about avoiding those 9am starts.

Saturday, 15 February 2020

The beauty premium in politics may result from a lack of knowledge about candidates

Back in 2017, I wrote a post about two research papers that looked at the beauty premium in political contests. The first of those two papers of those two papers (ungated earlier version here) showed that:
...politicians on the right are indeed more attractive than politicians on the left, using data from Australia, the European Union, Finland, and the United States...
They argue with a nice theoretical model that the reason for these differences is based on two things: (1) attractiveness is itself valuable, and voters are more likely to vote for attractive candidates; and (2) attractiveness signals that politicians have views that are further to the right. So, this explains why the attractiveness premium is greater for politicians on the right in low-information settings (where both effects work in the same direction) compared to politicians on the left (where the effects work in opposite directions, since left-preferring voters are more likely to see an attractive left candidate as being to the right of their views).
Another 2017 article, by Todd Jones (Cornell University) and Joseph Price (Brigham Young University), published in the journal Contemporary Economic Policy (sorry, I don't see an ungated version online), covers similar ground. Jones and Price compare the beauty premium between elections for the U.S. congress (where candidates are relatively well known) and elections for the House and Senate of individual states (where candidates are generally less well known). They use data for 800 candidates from 400 elections in 2012 (200 U.S. House and Senate elections, and 200 state-level House and Senate elections). The basic results are in line with the rest of the literature, as:
...a one standard deviation increase in a candidate’s beauty is associated with a 1.1 percentage point increase in the fraction of votes received and a 6.0 percentage point increase in the probability of winning the election.
Nothing new there - that's the beauty premium at work. When comparing high-profile and low-profile elections, they find that:
 ...the interaction term between beauty and high-profile election is −1.4 percentage points for vote share and −6.3 percentage points for winning (not significant), indicating that the beauty premium is much smaller for high-profile elections. The interaction term between beauty and incumbency status is also negative, with a coefficient of −2.2 percentage points for vote share and −9.0 percentage points in terms of winning the election, although this latter number is not significant.
Both of those results suggest that, when candidates are better known (as they would be for higher profile elections, and if they are the incumbent), the beauty premium is lower. They also find that:
...for each standard deviation a candidate is above the beauty mean the candidate loses a beauty premium of 1.6 percentage points in vote share for every standard deviation they spend above the sample mean. Generally, the positive 3.3% direct effect of spending outweighs the added beauty premium but it does leave the possibility that spending more could outweigh the beauty premium for candidates more than two standard deviations below the sample mean. 
On other words, candidates that spend more on their election have a lower beauty premium, and if they are ugly enough, the negative beauty premium would more than offset the gains from their election spending. Taken all together, and bearing in mind that these are correlations rather than causal, these results suggest that when voters are less aware of the candidates in the election (as would be the case for low profile elections, non-incumbent candidates, or where the candidate has not spent much on electioneering), the voters use each candidates' attractiveness as a signal of whether they should vote for them. This leads Jones and Price to conclude that:
...increased campaign spending may be socially beneficial by reducing biases that affect how individuals vote.
Many people argue that election campaigns involve too much spending. However, it appears that there is an argument to be made to the contrary, especially for otherwise low profile elections.

Thursday, 13 February 2020

Book review: The Revenge of Geography

I don't often read books on politics or geopolitics. So, Robert Kaplan's 2012 book The Revenge of Geography represented a nice change of pace. The subtitle is "What the map tells us about coming conflicts and the battle against fate". As he notes in the first chapter:
Geography is the backdrop to human history itself. In spite of cartographic distortions, it can be as revealing about a government's long-range intentions as its secret councils... A state's position on the map is the first thing that defines it, more than its governing philosophy even.
However, this is not a book based on a geographically deterministic view of the world. Kaplan is careful to lay out that geography is but one of many influences on geopolitics, albeit a particularly important influence.

The first part of the book outlines a lot of the history of geopolitical thought. I found this difficult reading, since it is not an area of scholarship that I am particularly familiar with. For me, the book improved substantially in the second part, where Kaplan goes into detail in discussing particular regions: Europe, Russia, China, India, Iran, and Turkey. The mix of theory, history, and geography is quite compelling, but very difficult for me to excerpt. However, as one example, take this passage on Russia:
Russia's religious and communist totality, in other words, harked back to this feeling of defenselessness in the forest close to the steppe, which inculcated in Russians, in turn, the need for conquest. But because the land was flat, and integrally connected in its immensity to Asia and the Greater Middle East, Russia was itself conquered. While other empires rise, expand, and collapse - and are never heard from again, the Russian Empire has expanded, collapsed, and revived several times... Geography and history demonstrate that we can never discount Russia.
Or this bit on China:
Sea power suits those nations intolerant of heavy casualties in fighting on land. China, which in the twenty-first century will project hard power primarily through its navy, should, therefore, be benevolent in the way of other maritime nations and empires in history, such as Venice, Great Britain, and the United States: that is, it should be concerned mainly with the free movement of trade and the preservation of a peaceful maritime system. But China has not reached that stage of self-confidence yet. When it comes to the sea, it still thinks territorially, like an insecure land power, trying to expand in concentric circles...
I enjoyed this book, but the title is a bit of a mystery to me. It's hard to see how this book is about the revenge of geography. Kaplan argues in the introduction that air power defeated geography, but it is now getting its revenge. I'm not buying it, as geography matters even in the case of air power. Nevertheless, that is a minor gripe, and this was an interesting book to read.

Tuesday, 11 February 2020

The economic value of thoughts and prayers

The increasingly clichéd response to any tragedy is to offer 'thoughts and prayers' to the victims and their loved ones. The cynical among us note that this is done in order to make the well-wisher feel better. But, is the offer of thoughts and prayers valued by the recipients? That is the question that this recent article by Linda Thunström (University of Wyoming) and Shiri Noy (Denison University), published in the journal Proceedings of the National Academy of Sciences, sought to address.

Now, obviously, thoughts and prayers are not traded in markets (yet!). So, there is no market price for these services. That means that a non-market valuation techniques is required. Thunström and Noy used an experiment to determine the value that people placed on receiving thoughts and prayers, which could be positive or negative:
Participants were told that a stranger would receive their description and offer a gesture of support in response. We applied a between-subjects study design and Christians and nonreligious participants were randomized into 1 of 4 conditions (C1 to C4). They participated in a [willingness-to-pay]-elicitation mechanism where they could exchange some or all of their $5 for supportive thoughts from a Christian stranger (C1), thoughts from an atheist stranger (C2), prayers from a Christian stranger (C3), or prayers from a priest (C4).
Essentially, the participants could give up some of the $5 they received for participating in the experiment, in exchange for thoughts and prayers from others. The results were interesting. They found that:
...on average, Christians value prayers from a priest at $7.17... and prayers from a Christian stranger at $4.36... In contrast, the nonreligious are “prayer averse”: on average, they are willing to pay $3.54... for a Christian stranger not to pray for them... Likewise, they are willing to pay a priest $1.66... not to pray for them...
So, thoughts and prayers have positive value for Christians (but interestingly, only from other Christians and not from an atheist), but have negative value for the non-religious (which included Atheists and Agnostics). In some further analysis, they found that Christians were more likely to agree with statements about the helpfulness of thoughts and prayers. That suggests that Christians are willing to pay for thoughts and prayers because they expect those thoughts and prayers to convey benefits on them.

That reminded me of this 2010 article (open access) by Nobel Prize winner James Heckman, published in the journal Economic Inquiry. The article is entitled "The effect of prayer on God’s attitude toward mankind", and Heckman concludes that:
A little prayer does no good and may make things worse. Much prayer helps a lot.
Presumably, that also demonstrates the benefits of prayer. [*]

[HT for the Thunström and Noy article: Elizabeth Oldfield in Unherd, via Marginal Revolution]


[*] Actually, the article by Heckman isn't serious. He was using the analysis of prayer to illustrate the foolishness of this earlier article by R.S. Singh, published in the Journal of the Royal Statistical Society in 1977.

Monday, 10 February 2020

A little bit of self-plagiarism doesn't hurt everyone, it seems

We all (hopefully) know that plagiarism is bad. Copying someone else's work and passing it off as your own is definitely not okay. But what about plagiarising your own work? As a senior colleague of mine once noted, there's only so many ways that you can describe the same methods and data, so some degree of self-plagiarism is unavoidable if you are using the same or very similar methods and data in multiple research papers. But how much is too much?

Take the following two snippets:

  1. "Compared to other OECD countries New Zealand has a poor crash record. In 1990, New Zealand had the third highest traffic death rate (21.5 deaths per 100,000 population) after Portugal and Spain and ranked the seventh highest out of 21 OECD countries at 3.3 deaths per 10,000 vehicles (Land Transport Safety Authority, 1995). These population and vehicle rates were 58 and 43%, respectively higher than Australian rates, New Zealand’s closest neighbour."
  2. "Compared with other OECD countries, New Zealand has a poor crash record. In 1990, New Zealand had the third highest traffic death rate (21.5 deaths per 100 000 population) after Portugal and Spain and is ranked the seventh highest out of 21 OECD countries with 3.3 deaths per 10 000 vehicles (Land Transport Safety Authority, 1995). These population and vehicle fatality rates were 58% and 43% respectively higher than rates in New Zealand’s closest neighbour, Australia."
The first quote is from the introduction in this 2002 article by Paul Scuffham and John Langley, published in the journal Accident Analysis & Prevention. The second quote is from the introduction in this 2003 article by Paul Scuffham, published in the journal Applied Economics. They're pretty similar, wouldn't you agree?

And before you think I'm cherry picking from the articles, you should read them both (the first one is gated, unfortunately). Here's another bit, from the discussion section:
  1. "We did not include all policy changes in the model primarily because our aim was to establish a link between economic factors and traffic crashes. Furthermore, there may have been more policy changes than observations and many policy changes are introduced simultaneously with other policy changes making distinguishing the effects of policies difficult – especially if dummy variables are used. Other variables not included were weather conditions and public holidays. These factors may have some explanatory power in forecasting crashes. However, the effects of these factors may be captured in the trend, seasonal or residual components of the STSM."
  2. "Not all policy changes were included in the model primarily because the aim was to establish a link between economic factors and traffic crashes. Furthermore, there may have been more policy changes than observations, and many policy changes are introduced simultaneously with other policy changes making distinguishing the effects of policies difficult – especially if dummy variables are used. Other variables not included were weather conditions and public holidays. These factors may have some explanatory power in forecasting crashes. However, the effects of these omitted factors may be captured in the trend, seasonal or residual components of the STSM."
Virtually the entire introductory section, and data and methods section, are identical in the two papers. The results are slightly different between the two papers because the dependent variables were specified slightly differently, but in the main, the results and discussion sections are incredibly similar as well (as the quotes above illustrate).

It's difficult to know how often this sort of thing happens (although Retraction Watch will give you some idea). I only picked it up in this case because I happened to read both of those articles consecutively, as part of background reading for a research project on road accidents in New Zealand (more on that in a future post). It's also not clear where the threshold is between acceptable and excessive self-plagiarism, and no guidelines exist (that I'm aware of).

The key point here is that this is essentially one journal article, and the authors have received double value for by publishing it twice. It's impossible to know their motives for this. At best, this is innocent and lazy writing. At worst, it is a cynical gaming of the publication system. Either way, it certainly hasn't hurt Paul Scuffham, who is described in his Griffith University profile as "one of the leading and most productive health economists in Australia and internationally". However, it's easy to be productive when you publish the same thing multiple times.

Sunday, 9 February 2020

The 2015 refugee crisis, attitudes towards immigrants, and the effect of immigrants on subjective wellbeing

I've been catching up on a bit of reading related to immigration, refugees, and their effects on the native-born population (see also this earlier post of mine on a related topic). The 2015 refugee crisis in Europe provides an interesting natural experiment to test a number of theories about immigrant assimilation, the impacts of migrants on natives, and attitudes towards migrants. A 2019 article by Dominik Hangartner (ETH Zurich) and co-authors, published in the journal American Political Science Review (open access), looks at the last of those three.

Using survey data from 2,070 residents of the Greek islands, Hangartner et al. look at how exposure to the refugee crisis has affected attitudes. They compare residents of islands that received any refugees during the crisis to residents of islands that did not, and use an instrumental variables approach. Their instrument is the distance of the island to the Turkish coast, which would be expected to affect the likelihood that an island receives refugees, but shouldn't affect attitudes directly (especially after controlling for a bunch of other variables in their analysis). They find that: exposure to the refugee crisis has statistically and politically meaningful effects on natives’ exclusionary attitudes, preferences over asylum and immigration policies, and political engagement. Exploiting the exogenous variation in refugee arrivals caused by distance to the Turkish coast — our instrument — we find that respondents directly exposed to the refugee crisis experience a 1/4 standard deviation (SD) increase in their anti-asylum seeker and anti-immigrant attitudes as well as a 1/6 SD increase in their anti-Muslim attitudes. Compared to respondents on unexposed islands, they are more likely to oppose hosting additional asylum seekers and to support the ban from school for asylum seekers’ children and are less likely to donate to UNHCR and to sign a petition that lobbies the government to provide better housing for refugees.
In other words, it's all bad news. Looking into the reasons for their results, they can exclude economic concerns:
...because refugees quickly left the islands for other European countries, the usual materialist concerns that immigrants compete with natives over scarce resources such as jobs or welfare benefits... do also not apply in this context.
So, it was the mere exposure to the crisis itself that led to these changes in attitudes. Most worryingly, it appears that these attitudinal changes had some persistence over time, because the survey was conducted in early 2017, nearly a year after the refugee crisis had abated.

Hangartner et al. put their results down to:
The inability of the local and European authorities to effectively manage the refugee flows and provide medical support and sanitary services caused chaotic scenes at the hotspots and sparked concerns about the spread of diseases.
That at least suggests that better handling of the situation could have avoided the worst effects. However, it is speculative, since we don't know what would have happened had the crisis been more effectively dealt with. There are two other conclusions from the research that are also worrying:
Our findings of a uniform effect of exposure to the refugee crisis across the sample suggest that this threat triggered exclusionary reactions not only among those already predisposed against immigration, but also among respondents who otherwise would exhibit inclusionary attitudes and have not voted for (extreme) right-wing parties in the past...
...we find that exposure to large numbers of asylum seekers causes natives to become more hostile not only toward refugees, but also toward economic migrants and Muslims, including native Muslims who have been residing in Greece for centuries.
The effects were generalised in the population, and had negative spillovers on attitudes to other out-groups. It may take some time for these effects to dissipate.

However, not all studies show bad news. This 2014 article by Alpaslan Akay (University of Gothenburg), Amelie Constant (George Washington University), and Corrado Giulietti (Institute for the Study of Labor, Germany), published in the Journal of Economic Behavior & Organization (ungated earlier version here), looks at the impact of immigration on subjective wellbeing (life satisfaction, measured on a 1-10 scale) of native-born Germans. Using 170,000 observations from the German Socio-Economic Panel survey over the period from 1998 to 2009, they find that: increase of one standard deviation in the immigrant share [in the local labour market area] is associated with an increase of 0.142 standard deviations in natives’ [subjective wellbeing]. This is rather a large effect if one considers that the standardized coefficient for being unemployed is −0.112 and for wage is 0.017.
Local unemployment and GDP don't seem to affect the results, so again there isn't an economic (or labour market) explanation for these results. When they dig a bit further, it is satisfaction with housing (and not satisfaction with job, health, or income) that seems to be driving the overall result.

Despite some attempts by the authors to argue otherwise, it isn't clear to me that these results are necessarily causal - perhaps immigrants are simply more likely to move to areas where people are happier. However, the results are at least suggestive, because the natives are happier where there are more immigrants, but the immigrants are not.

It would be interesting to see some further results on subjective wellbeing after the refugee crisis. The Hangartner et al. results suggest a dramatic change in attitudes following the crisis in an area that the refugees are simply transiting through. It would be interesting (and important for policy purposes) to know whether that effect spills over to their ultimate destinations.

[HT for the Hangartner et al. article: Marginal Revolution, back in January 2019]

Read more:

Saturday, 8 February 2020

The limits of classroom experiments

I make use of occasional classroom experiments in my ECONS102 class, to illustrate some of the key concepts (including asymmetric information, and common goods problems). Experiments are fun and engaging, for both the students and the lecturer. So, it would be easy to go overboard with experiments, but as I noted in this 2016 post, classroom experiments are subject to diminishing marginal returns. However, as with many novel teaching methods, it occasionally makes me nervous that the experiments are making the more engaged students do better, but alienating the disengaged students even more.

So, I was somewhat unsurprised by the results from this 2016 article by Gerald Eisenkopf (University of Konstanz) and Pascal Sulser (eBay), published in the Journal of Economic Education (it appears to be open access, but just in case there is an ungated earlier version here). Eisenkopf and Sulser ran an experiment in 29 Swiss upper secondary schools (Kantonsschule/Gymnasium), where classes were randomised between:

  1. Classes that employed the usual instructional methods with the usual textbook resources (the Control group);
  2. Classes that introduced the topic using a classroom experiment (the Experiment group); and
  3. Classes that introduced the topic through the teacher developing their own lecture material, but where experiments were not allowed (the Standard group).
In all cases, the evaluation was limited to the topic of common pool resources, and the experiment was remarkably similar to one that I run with my ECONS102 class each year:
At the beginning of the game, the pond contains four fish per player. In each of the 10 rounds, every player may catch between zero and three fish anonymously (by wearing masks). The number of fish remaining in the pond doubles between rounds. However, there is a capacity limit. The pond cannot hold more than four fish per player. Students are told that they win the game by catching the most fish.
The incentive for each player is to catch the most fish (and thereby win the game). However, in doing so, the pond quickly runs out of fish and all players are left with an empty pond. That is the nature of the Tragedy of the Commons.

Anyway, how well did the students in each group do? In a test of understanding after the topic, Eisenkopf and Sulser found that:
Absent any treatment intervention, 58 percent of all statements were solved correctly, while roughly 30 percent were answered falsely. Hence, the Control group managed to achieve about 28 percent of the theoretical maximum score (4.7 score points out of 17). Students of both treatment groups fare much better, yielding average scores of 50.49 percent (8.58 points) in the Standard and 50.41 percent (8.57 points) in the Experiment treatment. Evidently, both teaching interventions were able to increase economic understanding considerably, with an effect size of about 0.8 of a standard deviation each... Aggregate scores between the Standard and the Experiment group are remarkably similar...
So, students in either the Experiment or Standard group fared better at the end of the topic than students in the Control group. However, when they look at which students do better, by interacting the treatment effect with students' overall economic knowledge (based on a test of general economic understanding), they find that there are no differences in the Standard group, but that:
...our classroom experiment favors more competent students, while weaker students are worse off than they would be under a regime that depends on conventional teaching.
Based on their discussion of the results, it seems that the classroom experiment crowded out time for reading case studies and working on practice exercises. This wasn't a problem for students who had a high level of economic understanding, presumably because they were able to make the links between the experiment and the concepts quickly. However, the less able students were disadvantaged because they had less opportunity to develop the understanding, which they may have done through more traditional methods.

Unfortunately, as this study demonstrates, sometimes the coolest methods of teaching are not the most effective.

Read more:

Friday, 7 February 2020

Mobs rule in the economics laboratory

Bullying and victimisation are difficult subjects to study in the real world, because much of the time, the actions are hidden. Victims may be reluctant to speak out due to fear, and perpetrators are certainly not going to admit to their actions. One solution is to test theories in an experimental setting, as this recent article by Klaus Abbink (Monash University) and Gönül Dogan (University of Cologne), published in the journal Games and Economic Behavior (ungated earlier version here) does.

Abbink and Dogan investigate how mobs form, and how they choose their victims. They ran 42 experimental sessions in Amsterdam and Cologne involving 860 research participants, using a new experiment that they termed the 'mobbing game':
In a group of four players, each player can, but is not forced to, nominate a victim among the group members. If players nominate different individuals or no one, then there is no victim, and players receive their default payoffs. If all other players nominate the same individual, then this individual becomes the victim, his payoff is taken away and the bullies... receive an additional payoff.
So, if three of the players in a group can coordinate on choosing a fourth player, the 'victim' loses their payoff and the 'bullies' are rewarded. The game is played in a repeated fashion 20 times (with different variations I'll discuss below), which allows the players to coordinate (should they choose to do so). Abbink and Dogan ran several variants of the experiment. In the first set of variants, some groups had a high payoff to mobbing, others a medium payoff, a minimal payoff, or no gain at all. They found that:
...mobs become more frequent the higher the individual gains from it are. The overall frequencies of successful mobbing are 74.1% in High, 45.0% in Medium, 16.6% in Low, and 6.3% in No Gain.
Unsurprisingly, when there are greater gains to be had from mobbing, there is more mobbing. It didn't take long for groups to coordinate on finding a victim either:
If we look at the groups in which a victim existed for at least three periods... we see that it took on average 3.5, 4.7 and 6.3 periods in the High, Medium and Low treatments, respectively.
Generally, if in one round of the game two players chose the same other player, the third player would join them in the following round. So, most of the time, mobbing occurred quickly and was persistent. Abbink and Dogan note that this behaviour is in contrast with the idea that subjects believe in fairness and equity. In fact:
In the High treatment, it could be a sensible group behaviour to rotate the victim’s role. This would capture the efficiency gain without victimising a particular individual. However, we hardly observe any such behaviour. Thirteen out of 17 groups in this treatment coordinated on the same victim at least half of the time.
Abbink and Dogan then turn to the question of why mobbing occurs. Do players engage in mobbing because they fear that if they didn't, they would become the next victim? To do this, they designate one of the four players as 'safe' from being a victim. If a player is safe, they should not be afraid of becoming a victim, and if fear is playing a role in mobbing, then mobbing should be less frequent when one player is safe (because it takes all three players to coordinate on a victim for mobbing to be successful). They find that:
The inclusion of a safe player who cannot be mobbed does not reduce the mobbing rate even when the group size, and hence, the required number of bullies for a successful mob is higher. Moreover, safe players nominate more often and are more likely to be part of a mob, implying that impunity increases greed.
Clearly it's not fear of becoming the victim that causes mobbing, if safe players are more likely to nominate another player. Abbink and Dogan next look at how the players coordinate on choosing a victim - who gets chosen? It turns out that anything that marks a player out as different is effective as a coordinating device. Making one player richer than the others causes that player to be more likely to become the victim, but so does making one player poorer than the others. However:
While the fraction of focal victims is very similar in both treatments, the relative payoff of the focal player affects mobbing frequencies only if the focal player is richer. Overall rates are similar in the Medium and Poor treatments (45.0% versus 48.6%, not statistically significant), but if the focal player is richer, mob formation rate rises to 71.1%... Envy towards the richer player increases mobbing rates, while pity towards the poor seems to play no role.
They then dig into this a bit further, by making a second player a focal point by giving them a different colour than the others, as well as having one player be poorer than the others. They now find that: membership indeed plays a role in picking a victim: Subjects are less likely to pick an ingroup member, even when the outgroup victim is poorer. Pity towards the poor does not play a role in mobbing decisions, the poor player instead serves as a coordination device. Further, both payoff difference and colour difference serve as strong coordination devices; players who are different in either dimension are substantially more likely to be chosen as victims than the rest.
Overall across all of their experiments, they conclude that:
...the picture that emerges is that greed is the main driver of mobbing. Subjects themselves confirm this, as an analysis of the questionnaire data shows... There is no evidence for fear or pity playing a role in mob outcomes, and standing out makes one substantially more likely to be a victim.

This is an excellent and interesting paper, although Alex Tabarrok at Marginal Revolution also labelled the results "horrific" and that probably isn't far wrong either. As with all experimental studies, it requires replication though before we can fully accept the results.

[HT: Marginal Revolution, back in November 2018]

Wednesday, 5 February 2020

A long-run measure of country-level subjective wellbeing

Thanks to the Maddison project, we have long-run measures of GDP that go back to 1820 for many countries, and all the way back to 1 C.E. for some countries. However, it is widely acknowledged that GDP is an imperfect measure of wellbeing - at which point, everyone quotes Robert Kennedy's speech at the University of Kansas in 1968:
The gross national product does not allow for the health of our children, the quality of their education, or the joy of their play. It does not include the beauty of our poetry or the strength of our marriages; the intelligence of our public debate or the integrity of our public officials. It measures neither our wit nor our courage; neither our wisdom nor our learning; neither our compassion nor our devotion to our country; it measures everything, in short, except that which makes life worthwhile.
So, what do we do if we want an alternative measure of wellbeing? Many researchers have begun making use of measures of subjective wellbeing (e.g. life satisfaction, or happiness), notwithstanding recently identified problems with these measures (e.g. see this recent post). But the problem is that these measures have only been collected across a few countries, and only since the 1970s (e.g. see the World Database of Happiness).

A recent article by Thomas Hills (University of Warwick), Eugenio Proto (University of Glasgow), Daniel Sgroi (University of Warwick), and Chanuki Illushka Seresinhe (Alan Turing Institute at the British Library), published in the journal Nature Human Behaviour (ungated earlier version here), attempts to fill this gap. They use data from around 8 million books in the Google Books corpus, published in the U.S., U.K., Germany, and Italy. They analysed the sentiment of words in these books:
We use the words published in these books to compute subjective wellbeing at a given time by using affective word norms to derive sentiment from text. Affective word norms are ratings provided by groups of individuals who examine a list of words and rate them on their valence, indicating how good or bad individual words make them feel.
They then validate their data by showing that it correlates with life satisfaction data from the Eurobarometer survey since the 1970s, and that it seems to pick up key expected trends in life satisfaction over the whole period from 1820. These includes decreases in life satisfaction in all four countries during World War I, for instance.

They then demonstrate some other results using their data, such as the following (based on a regression of life expectancy and GDP growth on their National Valence Index measure): extra year of life expectancy is worth as much as 4.3% annual growth in GDP per capita.
There is a problematic issue that I can see with this data. The meaning of words changes subtly over time, and no doubt the sentiment of words also changes over time. So, measuring the sentiment over nearly two centuries, using word norms from modern times, has the potential to lead to bias. However, it is a measure we didn't have before, and all measures have their limitations. It seems to me that there is a lot of potential for using this measure in some interesting research, and the index can be downloaded from Github here.

[HT: The Economist]

Tuesday, 4 February 2020

Quantifying the economic impact of disease

This 2015 review article by Marcello Basili (University of Siena) and Filippo Belloc (University “G. d’Annunzio” of Chieti-Pescara), published in the Journal of Economic Surveys (and appears to be open access), has been sitting in my to-be-read pile for too long. Given that I've seen a few mentions recently of the economic costs of the coronavirus outbreak (e.g. see here and here for two examples), it seemed timely to read it now. And timely to write about it on my blog, as some enterprising research students might be tempted to investigate this topic.

The Basili and Belloc article is focused on reviewing the literature on measuring the economic impact of vector-borne disease. Incidentally, the coronavirus is not a vector-borne disease. Vector-borne diseases required a vector for transmission, like a mosquito or flea, which separates them from water-borne diseases, or non-vector-borne diseases, where transmission is directly between infected and uninfected individuals (which appears to include coronavirus). [*] The differences are important for the measurement of impact, but only in the sense that there are a broader range of options that are suitable for an acute disease outbreak like coronavirus than there are for chronic vector-borne diseases like malaria or dengue.

Having said that, the overall points raised in the Basili and Belloc article apply in both cases. There are two general approaches to measuring this impact:
The methods proposed by the literature can be roughly classified in two categories: macroeconomic and micro-based approaches. Macroeconomic approaches follow a traditional cross-country perspective, in which variations in economic outcome variables at country level are explained as a function of variations in population health regressors... At the opposite end of the spectrum, micro-based methods are bottom-up, since they are based on individual- or household-specific measures of the economic effects of a disease, which are then aggregated at the national level...
Basili and Belloc usefully present examples of both macroeconomic and microeconomic approaches, and usefully they present the limitations of each:
Both methods present several weaknesses. On the one hand, the main limit of macroeconomic analyses is that these suffer from endogeneity problems, due to the two-way causality between economic outcomes and VBDs’ incidence; on the other hand, micro-based measures tend to underestimate the true economic impact of VBDs, because they do not capture a number of macroeconomic factors and externality effects.
It seems somewhat obvious to point out that measuring impacts using macroeconomic models (of GDP or the growth rate) will be difficult, because you have to be able to account for all of the factors that contribute to GDP (or growth), or you face a potential omitted variable bias. This is particularly challenging to do well. On the other hand, working up from the microeconomic impacts on individuals or households can lead you to miss important macro-level impacts or spillovers between individuals, households, or regions. So while in some sense the microeconomic approach is simpler (though it often requires a number of heroic assumptions), it can lead to underestimates of overall impact.

For anyone thinking about attempting to measure the economic impact of coronavirus (or any other disease), then thinking through the available research methods is particularly important. In addition to the approaches that Basili and Belloc highlight, the impact of an acute disease like coronavirus could be measured using time series analysis (e.g. an event study), or through modelling (using an input-output model or a computable general equilibrium model). In making the choice of suitable methods, this article would be a good one to read (but not the only one, of course!).


[*] This categorisation of infectious diseases into three types is necessarily a simplification.

Monday, 3 February 2020

Dealing with traffic noise pollution in Mumbai

Back in 2016, I composed a modest proposal to deal with red light runners:
My modest proposal is to make the cost more immediate, and costly in time as well as monetary terms: Let's have road spikes that deploy on the white lines at traffic lights, 0.1 seconds after the light turns red. Any red light runners then face an immediate and severe cost of four new tires (blown out by the road spikes). On top of that, it pretty much ensures that there would be no drivers for whom the benefits of red light running outweigh the cost - because the time cost alone (of being forced to stop because all your tires are flat) is sure to exceed the time benefit of running a red light. And that's without considering the monetary cost.
My solution was based on increasing the costs of red light running, so that the costs would outweigh the benefits, and this would incentivise more drivers to obey red lights. Of course, my post was (somewhat) tongue-in-cheek. However, the Indian city of Mumbai is dealing with another traffic issue - the high propensity of drivers to use their horn when stopped at intersections, even if they are stopped at a red light. Their solution has a certain similarity to my solution to red light running. As reported by The Weather Channel:
For the Mumbai's perpetual honkers, who love to blare the horns of their vehicles even when the traffic signal is red, the Mumbai Traffic Police has quietly come up with an unique initiative to discipline them in order to curb the alarming rise in the noise pollution levels in the country's commercial capital.
From Friday (January 31, 2020), it has installed decibel meters at certain select but heavy traffic signals to deter the habitual honkers through a campaign named 'The Punishing Signal'.
Joint Police Commissioner (Traffic) Madhukar Pandey said that the decibel monitors are connected to traffic signals around the island city, and when the cacophony exceeds the dangerous 85-decibel mark due to needless honking, the signal timer resets, entailing a double waiting time for all vehicles.
The benefit of sounding your horn in frustration at waiting is psychological - drivers wouldn't save any time by using their horn. But the costs are also low, so the drivers probably think, 'why not?'. Making the noisy drivers wait longer increases the costs of sounding their horn, and should incentivise drivers to be a little quieter.

Can we tackle red light running next, please?

Read more: