How Federal Researchers are Practicing Responsible Data-Sharing

Researchers at NIH, DOE and the Census Bureau developed new methods for securely sharing sensitive data.

Government is working to unlock the power of data and advanced analytics to drive its decision-making, but this effort requires careful attention to privacy standards. Researchers at the National Institutes of Health (NIH), Department of Energy (DOE) and the U.S. Census Bureau are developing best practices for responsibly sharing sensitive data.

National Institutes of Health 

During the onset of the pandemic, NIH needed to scale data-sharing to study COVID-19 epidemiology. They made use of Fast Healthcare Interoperability Resources (FHIR) standards and APIs to quickly and securely share healthcare information.

“Basically, that is a messaging system. From university to university, from hospital to hospital, you can use this API to share the standard and fast way,” said Belinda Seto, NIH deputy director for the Office of Data Science Strategy, at Imagine Nation ELC this week. “We have to do this fast because the epidemic was spreading so quickly. How can we then get an idea of the epidemiology? What are the groups that are most disproportionately affected, and in what regions of the world, and what parts of the country?” 

However, patient-level data often contains Personally Identifiable Information (PII) and other protected information. So, the agency decoupled patient identities from patient data to protect sensitive information and adhere to privacy policies.

“We're interested in what's happening to the patients, especially the ones that were hospitalized,” Seto said. “So, how can we share the data from the electronic medical record? We clearly need to respect the privacy of the patients and keep the data confidential. … We share it by de-identifying the patient, not to disclose any personal identifying information according to HIPAA.”

Department of Energy

At the DOE, researchers often conduct analyses on sensitive or protected data. The department has a wide-reaching mission spanning scientific innovation, nuclear security and maintaining the national power grid.

Kelly Rose, the technical director for the Science-based AI/Ml Institute at National Energy Technology Laboratory, began her career at the DOE conducting a data warehousing effort to help understand the Deepwater Horizon oil spill.

“That spill didn't happen in a microcosm,” Rose said at Imagine Nation ELC. “There were root causes that were there were multiple that led to this. … I've spent the last 15 years building, for at least our niche of the Department of Energy, a public-private data collaboration and curation library laboratory, based on the lessons learned just from that environmental spill.

Rose’s team carefully curates private or restricted data, such as the data they gathered from the Deepwater Horizon oil spill, with the expectation that it may become public. She has seen firsthand how analysis from the 2010 oil spill has helped inform future crisis avoidance and management efforts.

“We're continuing to produce that next generation product that can then go public,” Rose said. “And that's what we've done with our Energy Data eXchange EDX platform. We are rapidly expanding this platform, it's a multi-cloud instance, trying to connect with our colleagues across the government to be more effective for our community base in the energy, environmental, and social R&D domains so that we can continue to grow the power of data for our end users.”

U.S. Census Bureau

At the Census Bureau, data sharing is a complicated task. Due to the nature of its mission, much of the Census Bureau’s data is highly protected, and the bureau is unable to share its information directly with other agencies.

But a new joint pilot program with the Internal Revenue Service (IRS) will use statistics to help other agencies gain new insights from their own data sets.  

“In the case of the stimulus checks from the Cares Act, there's an interest to see how well that's actually helping the economy, whether it's being distributed equitably,” said Ron Jarmin, Census Bureau deputy director and chief operating officer, at Imagine Nation ELC. “Tax data does not have codes like race and ethnicity, but of course the Census Bureau does. And so, we're doing a pilot project with relatively sophisticated techniques where they send us their information.”

The bureau is building a model of race and ethnicity at the micro level on IRS data. This model goes through a privacy filter before it is delivered back to the IRS, which then matches it to its other internal data sets.

“They match it to other IRS data, and then use some statistical techniques that take care of biases,” Jarmin said. “We found a way, using more modern statistical models, where we're not sharing the data, but we're sharing insights from a model that's based on our data that makes their data much more powerful.”

Jarmin hopes Census will be able to use this method to help other state, federal and local government agencies in the future.