Working to find actionable solutions to the nation's key challenges.

A New Technology May Revolutionize Privacy-Preserving Data Analysis: Secure Multi-Party Computation

By Nick Hart, Kira Fatherree

Friday, August 24, 2018

Valuable knowledge and insights can be gained from analyzing data, but too often available data are not effectively used to inform decision-making in government, business, or nonprofit organizations.

The U.S. Commission on Evidence-Based Policymaking’s unanimous findings presented to the president and Congress in 2017 concluded that emerging technologies are improving data analysis and privacy protections, which could help ensure government programs and services efficiently meet the needs of the American public.

What is this privacy-preserving technology?

One emerging technology, called secure multi-party computation, could soon allow for analysis of data while simultaneously strengthening privacy protections. This dual impact is potentially valuable to government at all levels: restrictions that limit how sensitive and confidential information is used can be respected, while at the same time valuable research and evaluation of program outcomes can be completed.

Secure multi-party computation technology allows data analysis and the sharing of a result without actually sharing any underlying sensitive data.

The secure multi-party computation technology allows data analysis and the sharing of a result without actually sharing any underlying sensitive data that might, for example, identify individuals or businesses. The approach performs analysis on data while it remains encrypted. As a result, the technology enables analysis to occur without introducing vulnerabilities that can occur from decrypting data with personal identifiers. Because the identities of individuals are not revealed at any point, the approach simultaneously improves data privacy and usefulness.

How does it work?

The technology processes data stored in distinct datasets that may be located at or controlled by multiple entities, but keeps that data confidential through all stages of analytical processing. Rather than sharing or transferring datasets with a third party, which can introduce risk to the identifiability of records, this technology allows information to be accessed securely and the results are produced without making any individual private information available. This secure privacy-preserving technology mathematically alters information by applying principles from cryptography to allow analysis of the data without revealing sensitive information.

An Example of the Logic of Secure Multi-Party Computation:

Calculating an Average of Hourly Wages

 

As an example of the way the technology works (displayed in the graphic above), suppose Alice, Bob, and Cynthia would like to know the average of their hourly wages without telling each other the actual amounts. Alice, Bob, and Cynthia each know their hourly wage and individually break that amount into four numbers that add to the hourly wage (step 1). They keep one amount to themselves then share one with each of the other two people and share one number with a trusted third-party, a consultant or outside expert. Sharing these individual data points reveals nothing about Alice, Bob, or Cynthia’s actual hourly wage. Alice, Bob, and Cynthia, and the trusted third-party, now each have three pieces of information that they each calculate an average for (steps 2 and 3). Next, Alice, Bob, and Cynthia share their averages with the trusted third-party who compiles the information by adding their own average calculation to the other three averages (step 4). The sum of those four averages produces the final result, the average of Alice, Bob, and Cynthia’s hourly wages. At the end of this process, the average hourly wage is known, but neither Alice, Bob, nor Cynthia have learned each other’s hourly wage.

Why is this technology relevant for evidence-based policymaking?

The technology marks a development in privacy protection that enables the use of data housed within different systems at the same time to examine relationships and to generate statistics. While not a panacea, the approach is especially useful when parties have an incentive to collectively identify the answer to a problem but may not want to share underlying data. For example, a study in Boston was able to use the method to calculate average wages for men and women to provide information on gender pay gaps, without knowing which specific firms have these disparities, and without allowing anyone to learn individual salary information.

Federal agencies may have similar incentives, and some federal laws prohibit sharing of individual or firm records in an identifiable form. Because the technology can produce answers to key questions without revealing identities, the approach holds promise for improved data sharing and analysis across government agencies, levels of governments, and even between private firms and government agencies.

Where has the approach been used successfully?

While introduced in the 1980s, the first large-scale application was attributed to calculating prices for Danish beet farmers in 2008. In that example, farmers needed collective information about beet prices to provide knowledge in advance of an auction of beet contracts. Farmers and buyers reported the price at which they were willing to sell and buy quantities of beets, which established a market price. The approach is also being used in defense and intelligence agencies for analysis of highly secure datasets.

What are the limitations?

While the technology has high potential, it does have limits. The deployment of the approach requires an expert to establish the platform and computer protocols and to execute the analysis. Additionally, because of the complex computer processes involved, running the analysis and obtaining an answer can be computationally intensive.

What is Congress doing to encourage multi-party computation?

Several pieces of legislation have been introduced in Congress that could address multi-party computation approaches in targeted policy areas.

  • In November 2017, following the Commission on Evidence-Based Policymaking’s report, Senators Ron Wyden (D-OR), Marco Rubio (R-FL), and Mark Warner (D-VA) introduced legislation in the Senate, while a bipartisan group in the House introduced counterpart legislation that would encourage multi-party computation for conducting analyses that would make student outcome information more readily accessible from higher education institutions.
  • In July 2018, Representative Kevin McCarthy (R-CA) introduced legislation to create a pilot program for the National Institutes of Health to deploy multi-party computation in hospitals to conduct research on a fungal infection.
  • In January 2018, the House Education and Workforce Committee discussed multi-party computation during a hearing with Paul Ohm, a former member of the Commission on Evidence-Based Policymaking.

Researchers continue to refine and develop the technology, including through pilot projects. Hopefully secure multi-party computation or a similar technology will fully evolve to fill the current void in secure data sharing and analysis.

KEYWORDS: 115TH CONGRESS, BIG DATA, COMMISSION ON EVIDENCE-BASED POLICYMAKING, KEVIN MCCARTHY, MARCO RUBIO, MARK WARNER, RON WYDEN