When the Supreme Court ruled in June that the Trump administration could not place a citizenship question on the 2020 census, civil rights advocates breathed a sigh of relief. 

Derrick Johnson, president and CEO of the NAACP, called the outcome “a victory for democracy.”

“The administration deliberately sought to increase the political power of whites at the expense of already underrepresented communities,” Johnson said in a statement.

Trump administration officials had claimed the question was needed in order to properly enforce the Voting Rights Act of 1965. But just weeks before the ruling, documents uncovered by the daughter of the late GOP strategist Thomas B. Hofeller suggested a partisan motive for the question. By including it, Hofeller had determined, the administration could acquire detailed data that would aid in redrawing legislative districts “advantageous to Republicans and Non-Hispanic Whites.”

But shortly after being rebuffed by the high court, President Donald Trump ordered federal agencies to mine their records to create a list of noncitizens anyway. 

“We will utilize these vast federal databases to gain a full, complete and accurate count of the non-citizen population, including databases maintained by the Department of Homeland Security and the Social Security Administration,” Trump announced at a White House Rose Garden press conference in July. “We will leave no stone unturned.” 

The same day, Trump issued an executive order requiring the Census Bureau to start collecting citizenship data – and ordering federal agencies to assist the bureau in this task.

In the months since, the bureau has been hard at work figuring out just how to build the president’s list. In a public meeting in September, the Census Bureau’s chief scientist, John Abowd, said a research program was underway at the bureau to meet the requirements of Trump’s executive order. And then he said something explosive: Even without a citizenship question, the bureau now can accurately identify whether a respondent is a citizen at least 90% of the time.

At that meeting of the Census Scientific Advisory Committee, Abowd said the bureau would augment census data with information collected by the Social Security Administration, Department of Homeland Security and state governments to estimate citizenship. 

And in an interview in November, Abowd confirmed that the bureau was on track to provide the citizenship data Trump has been seeking, with 90% accuracy.

“That’s the number we’ve been saying,” Abowd said. “That’s based on the 2010 census and on the assumption that the techniques we used in combination with the 2010 census will be just as effective on the 2020 census.”

What the citizenship data will be used for isn’t yet entirely clear. Abowd insisted that the list of noncitizens will be used only for statistical purposes and that individuals’ citizenship status will be kept secret. And Trump’s July 11 executive order states that the list will be used only “for making broad policy determinations” and “has nothing to do with enforcing immigration laws against particular individuals.”

Civil rights organizations and legal scholars say it’s unlikely that Trump would brazenly turn over the Census Bureau’s list to U.S. Immigration and Customs Enforcement. Doing so would violate a federal law that specifically protects the privacy of census data. And to do so, he’d need the involvement of Census Bureau officials who have sworn an oath to protect the information they gather.

Still, advocates worry.

“I think it would be foolhardy to ever underestimate this administration’s willingness to violate the law,” said Thomas A. Saenz, president and general counsel of the Mexican American Legal Defense and Educational Fund, which is suing the administration over its collection of citizenship data.

As early as 2018, statisticians at the National Academies of Sciences, Engineering and Medicine warned that the administration’s drive to compile a list of citizens appeared to push boundaries.

An academies statistics committee wrote in August of that year that plans to place a citizenship question on the census were not just a “ ‘reinstatement’ of a citizenship question to the decennial census for statistical purposes but rather the intended use of census responses as seed data to construct an ongoing citizenship status registry, something never before proposed as a task for the Census Bureau.”

“If it’s really a registry,” Don Dillman, a member of the committee, told a reporter, “I don’t know where it would start – and where it would end.” 

The White House did not respond to several requests for comment for this story.

There are also lingering questions about how secure the data the Census Bureau is collecting will be. 

In 2016, the bureau organized an internal “hack,” challenging a team of its data scientists to reverse engineer census responses from the broad aggregate datasets that are made public after each count. By applying a mathematical concept called the database reconstruction theorem, the team took the limited public records and successfully identified individual respondents with an extraordinary level of accuracy. 

A worker passes out instructions on how to fill out the 2020 census during a town hall meeting in Lithonia, Ga. It will be the first census ever to give respondents the option to fill out the form online, which millions of Americans are expected to do. CREDIT: John Amis/Associated Press

Meanwhile, the 2020 census is the first census ever to give respondents the option to fill out the form online, which millions of Americans are expected to do. And the Census Bureau’s new IT platforms may themselves be susceptible to data breaches. During a 2018 test of the bureau’s new systems, the census website was hacked from Russian IP addresses, Reuters reported.

Census Bureau officials insist their data systems are secure and say the agency is implementing an additional, far more effective method of keeping any 2020 data that is released anonymous. 

If the Trump administration is going to build a list of noncitizens, said Keshia Morris, census project manager at Common Cause, which advocated against the citizenship question, then “I feel the Census Bureau is probably the best place to do that, because of their confidentiality protections. But if their security isn’t good enough and anyone, including our enemies, can hack that data, then it’s definitely something to be concerned about.” 

When the Census Bureau hacked itself

Census data is supposed to be sacrosanct: locked away, free from prying eyes and special interests.

But while this privacy guarantee applies to individual census responses, the same isn’t true for aggregate data. The Census Bureau long has published hundreds of summary tables of its data – a statistical portrait of who Americans are and where they live and work. The information is widely used by business, government and academia to inform and shape an array of policy decisions.

For decades, the Census Bureau has employed statistical techniques to mask this public data to protect respondents’ privacy. In 2010, the bureau applied a “swapping” technique – essentially switching out information about some households that are at high risk of being individually identified for others within the same geographic area. For example, in 2010, as The New York Times reported, the bureau swapped out some data about a couple who were the sole residents of Liberty Island, which houses the Statue of Liberty. In order to protect the couple’s privacy, the bureau switched some of the couple’s answers with responses from another couple elsewhere in the state who shared similar characteristics. 

A census worker in Glendale, Calif., reviews a map of homes in 2016. For the 2020 census, the bureau has embraced a new strategy, called differential privacy, to protect its public data from hacks. CREDIT: U.S. Census Bureau

Yet even before the 2010 census, some experts had warned that these summary tables could be used to reverse engineer the original individual census responses. Applying the database reconstruction theorem, summary databases could be combined with other publicly available information to reconstruct underlying personal data.

In response to these security concerns, Abowd in 2016 assigned a team of data scientists to see if they could hack the 2010 census in this way.

The results of this internal “attack” were eye-opening. The team was able to accurately reconstruct the census responses for individuals’ age, sex, race, ethnicity and census block for almost half of the U.S. population. 

Alarmingly, the team also proved it could name names. By comparing results to personal information collected in a commercially available database, the team was able to correctly identify the names of about 17% of the total population, or about 52 million people. 

Hackers might have gone further, bureau officials acknowledged.

“This attack that we simulated is really just the tip of a very large iceberg,” Census Bureau statistician Philip Leclerc told a meeting at the National Academies of Sciences, Engineering and Medicine last year.

Robert Groves, who led the Census Bureau from 2009 to 2012, stressed that at the time of the 2010 census, the bureau was taking all the privacy measures it thought was necessary. It wasn’t until years later that he and others realized how susceptible the public data tables were to reconstruction. 

“In retrospect, we were releasing more data than we probably should have,” Groves told Reveal. 

A privacy panacea?

For the 2020 census, the bureau has embraced a new strategy for protecting its public data from hacks: a technique called differential privacy.

According to Harvard University computer scientist Cynthia Dwork, it’s a method of masking or camouflaging large amounts of data by creating a “synthetic” version of the data that’s been collected. That allows the findings and patterns of the information to be shared without the underlying data being published anywhere – and therefore vulnerable to being reconstructed.

Dwork, who helped develop the technique, describes differential privacy as “provably future-proof,” meaning it’s immune even to hacking techniques that haven’t been devised yet.

Cynthia Dwork, a Harvard University computer scientist who helped develop differential privacy, has called the data technique “provably future-proof,” meaning it’s supposed to be immune even to hacking techniques that haven’t been devised yet. CREDIT: Edmond J. Safra Center for Ethics

Abowd has echoed this “future-proof” claim at public meetings and in his interview with Reveal.

“The future-proof nature of differential privacy basically assumes infinite computing power and infinite knowledge,” Abowd told Reveal. “Suppose that you knew every bit in the confidential data except one. No amount of future computing will help you to get any closer to that one bit.”  

Abowd said by email that differential privacy will be used to protect all the information the bureau collects, including the sensitive citizenship data. “The modern disclosure avoidance system that we have adopted for the 2020 Census, called differential privacy, will ensure that publications, even block-level publications, cannot be used for enforcement activities aimed at individuals, including immigration enforcement,” Abowd wrote.

Kobbi Nissim, a computer science professor at Georgetown University and an expert on database security issues, was more cautious.

“I’m one of the inventors of differential privacy, and I would say this is the best guarantee we have now, but I would not claim that it’s a panacea,” he said. “I think that anybody who is claiming that any measure of privacy protection is a panacea is short-sighted.”

Meanwhile, the Census Bureau faces another fundamental threat: straight-up hacking of the underlying data itself.

In its December report, Reuters found that an overcomplicated rollout of new technology has left the Census Bureau’s computer networks open to this risk. Hackers working from computers with Russian IP addresses attempted to access census data during a test run of the census website in 2018.

Census Bureau officials insist that the threats Reuters identified are overblown and that the sensitive data that will be collected from millions of Americans online will remain safe.

But given that the Census Bureau is for the first time collating perhaps the most sensitive data of all, citizenship data, not everyone is convinced.

“This is the kind of thing that you’d like to have a decade of experimentation with,” said Groves, the former Census Bureau director. “There is no zero risk. That’s just impossible, and that’s our problem.”

This story was edited by Lance Williams and Esther Kaplan and copy edited by Nikki Frick.

Will Carless can be reached at wcarless@revealnews.org. Follow him on Twitter: @willcarless.

Will Carless was a correspondent for Reveal covering extremism. He has worked as a foreign correspondent in Asia and South America. Prior to joining Reveal, he was a senior correspondent for Public Radio International’s Global Post team based in Rio de Janeiro, Brazil. Before that, Will spent eight years at the Voice of San Diego, where he worked as an investigative reporter and head of investigations. During his tenure in San Diego, Will was awarded several prizes, including a national award from Investigative Reporters and Editors. He has been a finalist for the Livingston Awards for young journalists twice in the last five years. He surfs, spends time with his family, travels to silly places and pretends he’s writing a novel.