Sunday, 25 January 2026

The Census Tree project

An exciting (and new-ish) dataset offers us an unprecedented opportunity to explore research questions using historical US Census data. When I posted about what's new in regional and urban economics last year, one of the things that was raised was the linking of historical Census records over time. That was based on the work of Abramitzky et al., known as the Census Linking Project (CLP). However, in a recent article published in the journal Explorations in Economic History (open access), Kasey Buckles (University of Notre Dame) and co-authors report on an alternative Census linking dataset that has far larger coverage than the CLP. As they explain:

In the Census Tree project, we use information provided by members of the largest genealogy research community in the world to create hundreds of millions of new links among the historical U.S. Censuses (1850–1940). The users of the platform link data sources—including decennial census records—to the profiles of deceased people as part of their own family history research. In doing so, they rely on private information like maiden names, family members’ names, and geographic moves to make links that a researcher would never be able to make using the observable information...

The result is the publicly-available Census Tree dataset, which contains over 700 million links among the 1850–1940 censuses...

The article describes the creation of the Census Tree dataset, which can be accessed for free online. Buckles et al. also demonstrate the use of the dataset, in a particular application in comparison with the CLP data of Abramitzky et al.:

...who show that the children of immigrants were more upwardly mobile on average than the children of the U.S.-born in the late 19th and early 20th centuries. We replicate this result using the Census Tree, and are able to increase the precision of estimates for each sending country. Furthermore, the Census Tree includes sufficient numbers of links to produce estimates for an additional ten countries, including countries from Central America and the Caribbean. We find that the sons of low-income immigrants from Mexico had significantly worse outcomes on average than sons of fathers from other countries, including U.S.-born Whites. We further extend [Abramitzky et al.] by analyzing the mobility of women in a historical sample, and compare these results to historical estimates for men and modern estimates for women. While the patterns for daughters and sons are broadly similar, differences in marriage patterns contribute to gender gaps in mobility in some countries.

As I noted in this post last year, the ability to link people over long periods of time (including between generations) has opened up a wealth of new research questions. Buckles et al. offers a peek at the range of research that has already been done using the Census Tree dataset (see Appendix B in the paper for a bibliography).

Now, the coverage isn't perfect, and there is still some ways to go. You can evaluate the quality of the dataset based on what Buckles et al. report in their article, but it is clearly better than previous efforts. And importantly:

...we plan to update the Census Tree every two-to-three years to incorporate new information added by FamilySearch users, to include new links... and to implement methodological advances in linking methods that we and others develop.

This seems like a really important resources for researchers in economics, sociology, regional science, and other fields, and not just for those interested in economic history. 

No comments:

Post a Comment