This project is the first comprehensive examination of African North Americans who crossed one of the U.S.-Canada borders, going either direction, after the Underground Railroad, in the generation alive roughly 1865-1930. It analyzes census and other records to match individuals and families across the decades, despite changes or ambiguities in their names, ages, “color,” birthplace, or other details. The main difficulty in making these matches is that the census data for people with a confirmed identity does not stay uniform decade after decade. Someone might be recorded not with their given name but instead a nickname (Elizabeth to Betsy); women can marry or get remarried and change their names; racial measures by a census taker may change (black to mulatto, or mulatto to white); someone might say they are from Canada, even when they were born in Kentucky, depending on how the question was asked; people who were estimating their ages might be 35 in 1870 and 40 in 1880 and 50 in 1890, for example.
To date, approximately 1,000 matches have been manually generated in a database of 50,000 records, and another 1,000 have been found through my partnership with the Columbia University Data Science Institute Data for Good initiative. Matches were made by looking first at the calculated birth year, then at the name given, location, place of birth, and sometimes at household members. Finding an algorithmic way to predict and identify these matches will allow these records to be paired with other sources, such as government pension data, and will factor into research on migration patterns, specific families, and nodes - whether personal or geographic — that tie these African North American groups together.
Current goals include:
- Continuing to add data by scraping census data in US and Canadian censuses from online databases, and OCR conversion and data cleaning from research notes created through a National Endowment for the Humanities grant.
- Ongoing ways to predict or confirm matches in the data, likely using confidence factors based on name, birth year, family structure, and/or location.