In 2020, the Bipartisan Policy Center convened a cross-sector group of experts to imagine an AI National Strategy for Congress. Experts debated issues from training needs to norms for international cooperation, but one issue emerged as the most contentious: impact assessments. These experts disagreed on what an accountability mechanism should look like, and how precisely it should acknowledge and mitigate bias in AI systems. Therefore, for the past year we have focused our efforts on AI impact assessments. We held four convenings with experts from academia, civil society, government, and industry to identify areas of consensus around language, scope, and oversight for an impact assessment framework.
Bodies from the European Union to South Korea have already started to release proposals for AI impact assessments. Now, as the U.S. government considers its own guidance, the Bipartisan Policy Center has identified six common goals shared by AI experts from across sectors.
1) Think beyond “hard law.” Will American companies conform to the European Union’s proposed AI Act, whose top-down approach could require companies to hand over their proprietary source code to regulators? Seeking to balance the trade-off between innovation and safety, panelists preferred the current process taken by the National Institute of Science and Technology (NIST), which is piloting a voluntary risk-management framework with stakeholders before considering what binding rules might look like.
2) Think beyond a “one and done” document. What happens when a company releases a new credit card, but is accused of assigning a higher credit limit to a husband than his wife? When investigating an AI system for potential bias in credit lending, for example, it helps to think of impact assessments as less a document and more a process. Because AI systems learn, adapt, and iterate, an organization’s process of documenting decisions creates a record that it can rely upon when auditing a system. This not only helps an organization to ensure the system is operating as intended, but if something goes wrong, the organization can better pinpoint the faulty decision or input. Not everything can fit in a single tool or assessment. Therefore, a governance structure with continuous review and testing promotes accountability.
3) Think beyond accuracy as the measure of success. Consider a health services company with a noble goal: identify patients at higher risk to end up in the emergency room, and give them additional care resources. However, by using health spending as a proxy for health needs, the system could be biased against people of color who may not spend as much relative to their clinical risk. In 2018, the research firm Gartner predicted that through 2022, 85% of AI systems will fail—largely because if a company only tests for a single performance metric, it will miss biases in the dataset, algorithms, or teams responsible for managing them. Companies should, therefore, not only test for one metric, often accuracy, and instead include other important qualities such as explainability, transparency, robustness, and security. Predictive tools that decide important life outcomes, from state medical benefits to job opportunities, often produce differential effects across race, age, or gender or sexual orientation. To mitigate this risk, an impact assessment allows an organization to consider additional facets when determining if the AI system is operating as intended.
4) Think beyond a “one-size-fits-all” general framework. The conversation about an AI system will look different when it’s the U.S. government considering a healthcare or national security application, as opposed to a company considering a movie recommendation application. Organizations grappling with different harms will require tailored impact assessments. While there is value in having a centralized body or standards to help with core requirements (e.g., the General Service Administration’s AI Center of Excellence), an industry or agency-specific approach gives those closest to the application power to think about the impact of expanded AI. This context-specific approach allows for AI owners to quickly put assessments into practice.
5) Think beyond computer scientists. What if AI literacy was baked into the core academic curriculum, not only in computer science, but also in philosophy, sociology, psychology, and other engineering disciplines? These fields may seem unrelated, but collectively they can evaluate the long-term societal impacts of an AI system. Because an impact assessment calls for a range of technical and ethical judgments, it calls for a range of perspectives. Multi-disciplinary education in AI, from high school to graduate school, helps to ensure responsible development and understanding of this new modern language.
6) Think beyond deployment. If you are trying to de-bias an AI system after creating it, you have already lost. Instead, organizations should start thinking about bias at “step zero,” from the initial training dataset. An effective governance structure keeps humans in the loop at every stage of the process—design, development, and deployment—and level of the organization—from engineers to executives. The closer an organization is to an “all-hands-on-deck” approach, the better it can identify challenges, problems, and risks throughout the process.
However, we also heard different points of emphasis from each group.
Industry, as the group that would bear the brunt of determining risk, stressed the need for regulators and developers alike to think through complicated issues, from third-party auditing to certifications, before issuing binding rules. For example, panelists discussed how ethics is a useful lens through which to view trade-offs between competing factors (e.g., equity, efficacy, and accuracy). However, they urged the public sector to remember that there is no single ethical framework, and that different ethical lenses lead to different conclusions. More so than other sectors, the industry panel reminded us to think about the complicated trade-offs associated with design decisions. The panelists also highlighted the challenge of creating a culture where employees are attuned to risks, such as reputation, that are difficult to quantify but nonetheless vital to an organization’s mission.
Academia called for more investment in the right places, stating that investment in compute and data resources has become overconcentrated in the private sector. Panelists argued that it will harm innovation if we do not invest in academic research to develop new AI breakthroughs. More so than other sectors, panelists from academia stressed the need for guard rails that ensure fairness, privacy, security, and other human and civil rights. Finally, panelists offered examples of emerging technologies, such as homomorphic encryption, that allow AI systems to perform computations while retaining the privacy of user data.
Government professionals believed that the U.S. needed its own voluntary AI approach that is distinct from the E.U.’s approach. Panelists highlighted the value of innovative efforts within the U.S. government—such as the CMS AI Health Outcomes Challenge and GSA AI Center of Excellence—that incentivize responsible AI development through prizes and expertise. Further, panelists advocated for upgrading data.gov to become a home for “AI ready” datasets.
Civil Society, more so than other sectors, focused on what needs to happen—from guidance to education—to ensure responsible AI development. Panelists noted that the transparency created by impact assessments should empower different lines of defense within an organization, rather than open the door for corporate backlash and additional liability. Panelists also stressed the role of internationally-trusted bodies such as the IEEE, ISO, and ACM to develop risk frameworks.
And finally, these 10 major questions echoed across each group of stakeholders:
- If over-regulation can kill innovation, but lack of regulation can cause harm, what is the right balance to strike?
- How should we assess the risks of different applications, whether the stakes are commercial (e.g., movie recommendation or newsfeeds) or life-and-death (e.g., criminal justice, health care, or national security)?
- How can regulation serve as the best partner to innovation? How much informal learning should take place before formalizing a regulatory structure?
- How can impact assessments become a tool that empowers organizations to feel comfortable identifying risks and potential harms, rather than instill fear of retaliation?
- What is the appropriate balance of technical means (e.g., engineering mechanisms) and non-technical means (e.g., qualitative questions) to ensure responsible system behavior?
- How should impact assessments account for core versus context-specific risks? Is it better to group risks by industry (e.g., retail, telecommunications, etc.) or by use case (e.g., roll-out of services to a community, regardless of industry)?
- How should we harmonize with the EU when the US has a different regulatory approach?
- How can impact assessments create buy-in and escalate risks from relevant stakeholders across an organization (e.g., C-suite, developers, and lawyers)?
- How can we make impact assessments truly operational?
- How should we decide what applications constitute high-risk? Who should decide?
Support Research Like This
With your support, BPC can continue to fund important research like this by combining the best ideas from both parties to promote health, security, and opportunity for all Americans.Donate Now