Emerging privacy-preserving technologies and approaches hold considerable promise for improving data privacy and confidentiality in the 21st century. At the same time, more information is becoming accessible to support evidence-based policymaking.
In 2017, the U.S. Commission on Evidence-Based Policymaking unanimously recommended that further attention be given to the deployment of privacy-preserving data-sharing applications. If these types of applications can be tested and scaled in the near-term, they could vastly improve insights about important policy problems by using disparate datasets. At the same time, the approaches could promote substantial gains in privacy for the American public.
There are numerous ways to engage in privacy-preserving data sharing. This paper primarily focuses on secure computation, which allows information to be accessed securely, guarantees privacy, and permits analysis without making private information available. Three key issues motivated the launch of a domestic secure computation demonstration project using real government-collected data:
- Using new privacy-preserving approaches addresses pressing needs in society. Current widely accepted approaches to managing privacy risks—like preventing the identification of individuals or organizations in public datasets—will become less effective over time. While there are many practices currently in use to keep government-collected data confidential, they do not often incorporate modern developments in computer science, mathematics, and statistics in a timely way. New approaches can enable researchers to combine datasets to improve the capability for insights, without being impeded by traditional concerns about bringing large, identifiable datasets together. In fact, if successful, traditional approaches to combining data for analysis may not be as necessary.
- There are emerging technical applications to deploy certain privacy-preserving approaches in targeted settings. These emerging procedures are increasingly enabling larger-scale testing of privacy-preserving approaches across a variety of policy domains, governmental jurisdictions, and agency settings to demonstrate the privacy guarantees that accompany data access and use.
- Widespread adoption and use by public administrators will only follow meaningful and successful demonstration projects. For example, secure computation approaches are complex and can be difficult to understand for those unfamiliar with their potential. Implementing new privacy-preserving approaches will require thoughtful attention to public policy implications, public opinions, legal restrictions, and other administrative limitations that vary by agency and governmental entity.
This project used real-world government data to illustrate the applicability of secure computation compared to the classic data infrastructure available to some local governments. The project took place in a domestic, non-intelligence setting to increase the salience of potential lessons for public agencies.
Data obtained under a confidentiality agreement from Allegheny County’s Department of Human Services in Pennsylvania were analyzed to generate basic insights using privacy-preserving platforms. The analysis required merging more than 2 million records from five datasets owned by multiple government agencies in Allegheny County. Specifically, the demonstration relied on individual-level records about services to the homeless, mental health services, causes and incidences of mortality, family interventions, and incarceration to analyze four key questions about the proportion of: (1) people serving a sentence in jail who received publicly-funded mental health services; (2) parents involved in child welfare cases who received publicly-funded mental health services; (3) people serving a sentence in jail who received homelessness services; and (4) suicide victims who previously received publicly-funded mental health services. To BPC’s knowledge, this demonstration is the first of its kind completed in the human services field.
To demonstrate and characterize applicability of privacy-preserving computation for these analyses, the project team performed them on two distinct privacy-preserving platforms. The first platform, called Jana and developed as part of the Brandeis program for the Defense Advanced Research Projects Agency, achieves secure computation entirely in software. Jana uses a combination of encryption techniques to protect data while at rest and in transit, and uses secure multiparty computation to protect data during computation. Specifically, Jana uses multiple servers to perform computation on cryptographic secret shares of data, while assuring that those servers never see the data in decrypted form.
The second platform, called FIDES and developed as part of the IMPACT program for the U.S. Department of Homeland Security, achieves secure computation via a hardware-enabled cryptographic enclave. Specifically, FIDES uses an Intel Corporation processor and the Intel Software Guard Extensions to compute in an area of the processor that is restricted from access by other code running on the computer, including the computer’s own operating system. No part of the processor or software, aside from that hardware-secured enclave, ever sees the data in decrypted form.
These two privacy-preserving computation platforms offer similar approaches: data arrive at the computation platform already encrypted, analysis is performed in ways that strictly do not reveal anything about the data, and results are securely provided to users. The goal in these experiments was to compare these two approaches with a classic data analysis setting. Successful completion of the demonstration with human services data yielded the following insights:
- The experiments produced valid, reliable results. Both platforms generated valid results consistent with traditional data analysis approaches. This outcome suggests that the queries using these privacy-preserving approaches are not subject to diminished quality that would affect the validity or reliability of statistical conclusions. Therefore, multiparty computation models satisfy the demonstration’s core criteria for enabling data use and privacy preservation.
- The efficiency of the experiments presents a trade-off for policymakers. Different modes of operationalizing the privacy-preserving technologies offer trade-offs for answer timeliness. Analyses with nearly 200,000 records using the software-based approach required nearly three hours to complete, whereas the same queries in the hardware-enabled environment returned results in one-tenth of a second. These times have substantial implications for applications in government operations with rapid decision-making architectures.
These findings suggest that these approaches offer considerable promise for public policy in achieving improved data analysis and tangible privacy protections at the same time. However, effort is still needed to further develop privacy-preserving technologies to make their deployment more time efficient prior to widespread use in government agencies. The scope and scale of such deployments will likely have either substantial cost implications or substantial delays in response times for computation, depending on the desired trade-off for the privacy-preserving approach. In addition to developing technical precision for privacy guarantees, further development of the technologies must also include learning about approaches for deploying the protections within complex organizational or governmental infrastructures and legal frameworks that may not explicitly encourage such activities.
This demonstration project offers a compelling example of how the technologies can be deployed—which can advance consideration of the approach within domestic, non-intelligence agencies at all levels of government.