Reveal's online tool, which uses NamUs data, allows searches of missing and unidentified cases simultaneously and returns side-by-side results. Credit: Rachel de Leon/Reveal

The bodies of more than 10,000 people across the country have never been identified. As part of Reveal’s Left for Dead investigation, which examined the fractured system that fails to link missing people with the unidentified dead, we built a tool to try to make it easier to search those cases.

We wrote programs to scrape data from two federal databases that are part of the National Missing and Unidentified Persons System, or NamUs. Our goal with The Lost & The Found was to streamline the process of matching missing persons with the unidentified dead. Our hope was that the tool could lead to more cases being solved.

NamUs first made its data public in 2007 through a website that offered unprecedented access to open case files. But the public site has limitations. Users complain that they can’t view or search missing and unidentified cases at the same time, making it more difficult to narrow cases. The cases also are displayed on the federal site in a way that prioritizes text over pictures. Except for an initial thumbnail, it takes several clicks to get from a search query to the collection of photographs or artist renderings of missing or unidentified people.

Despite these shortcomings, a community of dedicated amateur Web sleuths from around the country have used the data to solve cases.

We interviewed several sleuths as we designed our application’s interface and functionality. We created a tool designed for this community, hoping it would serve their needs and be useful to a broader audience.

The result of this work was a Django-based application, sped up by a Solr search that filters the two databases simultaneously. Each search generates two API queries – one for missing persons cases and one for unidentified bodies. It returns the results in side-by-side columns.

The data in the app comes from a regular scrape of the NamUs websites. The federal databases don’t have a publicly scrapable API, but ID numbers for each case are assigned sequentially and are part of each case’s URL. So we visit each potential case URL to see whether something is there, and if so, run checksums to see whether anything has changed since the last time we visited by using a cached version of the page.

The databases have more than 45,000 records combined, which is too many pages to visit each night at a speed that won’t risk overloading the federal website’s servers. So each night, we visit the last 30 days of records, plus one-seventh of the rest of the database. This means each record is updated at least once a week.

Our team also standardized many field names between the two databases to simplify the user’s experience (and our programming tasks).

Members of the Web sleuth community use different methods to solve cases. Some rely on memory, matching peculiar details in cases with others they have seen before. Some copy details of the cases into their own databases or spreadsheets. Others use detailed analysis of facial structures, focusing on cases in which the missing and unidentified dead have recognizable facial photos.

Despite the many working methods, a few stood out that we thought we could optimize. Most sleuths work on two screens or browsers, side by side. Most rely to some degree on pictures, especially those of identifying features such as faces, tattoos or jewelry. Many start with an unidentified person and then filter through missing persons cases to improve the odds of finding a match, because there are fewer unidentified bodies than missing people. Many wanted a regional search option for cases involving a person who crossed state lines.

From this feedback, we designed a layout with side-by-side columns for the missing and unidentified, with the ability to filter through both databases simultaneously. Users also can search the database by keywords, such as “ Search results prioritize available photos and essential case information – such as sex, date last seen, date found and race.

Help make a match

Help match the missing and unidentified through our Web tool, The Lost & The Found, which allows you to search NamUs data, including photos and other details, side by side. More information is available about how to use this tool and what to do if you have a missing loved one.

When structuring our page, we knew pictures were critical, but veteran Web sleuth Polly Penwell told us that looking through the photos of the dead can be emotionally exhausting. The photographs often are brutal – not something anyone would want to stumble upon accidentally. We needed to place photos prominently for the application to be effective, but we didn’t want to shock casual visitors coming to the site.

So we added a warning upon entering the site:

We also added a photo blur that users could toggle on and off, giving them the ability to see as much or as little as they want. Blurring, instead of simply hiding the photos on toggle, also hints at the content underneath while making showing the photos less of a shock.

After looking through a page of results, users can click on a case to see more details. Clicking on a case also locks it into place on the page. They can then filter down potential cases on the other side of the screen and compare them to the selected case.

Once users select a missing and an unidentified case, a button appears at the top of the page so the combination can be reported as a possible match. Clicking on the button brings up a form that asks users why they think this is a potential match and reminds them to check that the match has not already been ruled out in the federal database. Missing persons who have been ruled out are listed in the details of the unidentified body record. Users who submit matches get an email with instructions on how to contact the agencies in charge of the cases. The matches also go to several journalists at Reveal.

While this application makes it easier to become an amateur Web sleuth, it can’t solve all the weaknesses in the NamUs system. There are gaps in the federal data because local authorities are not required to report their cases to NamUs.

Similarly, because NamUs data comes from so many sources, many fields are filled out incorrectly or incompletely, thus making them incorrect or incomplete in our own database. In one case, an unidentified body’s glass eye had been misspelled as “gless eye,” which obscured a potentially important identifying feature. Sometimes, names of missing people are entered differently from how they are spelled in news reports.

The availability of so much information makes solving cold cases seem deceptively simple. While making the app, it became a running office joke to ask Allison every couple of days whether she’d solved a case yet, as though she were failing if she didn’t find a match. But the task is much trickier than it appears.

Users have submitted more than 100 potential matches so far, and some look quite promising.

This story was edited by Jennifer LaFleur and copy edited by Nikki Frick.

Allison McCartney can be reached at amccartney@cironline.org, and Michael Corey can be reached at mcorey@cironline.org. Follow them on Twitter: @anmccartney and @mikejcorey.

Allison McCartney

Allison McCartney recently graduated from Stanford University with a master's degree in journalism, specializing and computational and data journalism. In 2014, she was one of two AP-Google Journalism and Technology scholars, and in 2015 she received a Magic Grant from the Brown Institute for Media Innovation to work on a web dashboard for analyzing government contracting data. Before coming to Stanford, she worked as the editor of PBS NewsHour Extra, the educational resource site for the PBS NewsHour. Her work has appeared in various media outlets including MOTHERBOARD, SFGate.com, Entrepreneur Magazine and KQED. Allison is originally from Plano, Texas, and also holds a degree from Washington University in St. Louis.

Michael Corey is Reveal's senior data editor. He leads a team of data journalists who seek to distill large datasets into compelling and easily understandable stories using the tools of journalism, statistics and programming. His specialties include mapping, the U.S.-Mexico border, scientific data and working with remote sensing. Corey's work has been honored with an Online Journalism Award, an Emmy Award, a Polk Award, an IRE Medal and other national awards. He previously worked for the Des Moines Register and graduated from Drake University. He is based in Reveal's Emeryville, California, office.