Last year, as companies launched their generative AI for widespread public use, several individual content producers and media companies took legal action to protect their creative works from misuse. Central to these debates are ethical and legal implications of training these large generative AI models and policies that govern the intricate interplays of AI, data privacy, and copyright. Several legal challenges illuminate questions concerning the unauthorized use of personal data in AI model training, where few protections exist. Others raise concerns about the copyrights and reputation of individuals in creative industries, including the application of the “fair use” doctrine—allowing specific uses of copyrighted works for education, news reporting, research, and other areas.
The core issue in these legal challenges lies in the data used to train generative AI models. In general, more data for training or fine-tuning improves a model’s efficiency, precision, or generalization. However, using methods like web scrapping, which involves automated extraction of vast volumes of content from websites and online databases, introduces complex issues around privacy, licensing, and transparency. These techniques raise legal and ethical considerations about the legitimacy of obtaining the training information in the first place and about the application of these systems’ outputs.
As courts exercise their authority, the role of regulators and policymakers in governing the intersection of AI, privacy, and copyright needs exploration.
Three Legal Challenges
1) Most recently, a lawsuit filed by the New York Times accuses OpenAI and Microsoft of widespread copyright infringement and competition concerns. At its core, the lawsuit asserts that both companies used millions of published news articles by The Times to train their large language models (LLMs), powering generative AI chatbots such as OpenAI’s ChatGPT and Microsoft’s Copilot. In response, OpenAI wrote that the New York Times is not a significant source of information in its training data and that it aims to prevent “regurgitation” of content. OpenAI also noted it is helping news organizations “elevate their ability to produce quality journalism by realizing the transformative potential of AI.”
Some of the policy and legal questions this lawsuit raises include:
2) A proposed class action lawsuit filed in July 2023 alleged Google misused personal and private information to train its AI system. The plaintiffs, including authors, artists, journalists, and others, argue Google used their data without “notice, consent, or fair compensation.” Google updated its privacy policies in early July to specify that it may “use publicly available information” to train its AI models and tools, such as Google’s new chatbot Bard, and used this as its defense in court, asserting that even the plaintiffs recognize the training data that was used was “public.” In Google’s motion to dismiss states the lawsuit “would take a sledgehammer not just to Google’s services but to the very idea of generative AI.” Similar lawsuits have also been filed against OpenAI and Microsoft, raising parallel concerns.
Some of the policy and legal questions these lawsuits raise include:
3) Another proposed class-action lawsuit filed by three artists in U.S. federal court in San Francisco claimed that billions of copyrighted images enable AI text-to-image generator tools like Stability AI, Midjourney, and DeviantArt to create images in those artists’ styles without permission or compensation. StabilityAI defended that it neither stored nor incorporated copyrighted images into its AI system. The judge dismissed most of the claims against Midjourney and DeviantArt, as the class action members’ images were not registered with the Copyright Office, though one claim against StabilityAI was given 30 days to amend their case with sufficient evidence. At the time of this blog’s publication, several other lawsuits involving artists, authors, and musicians brought up issues around the “fair use” of copyright-protected works.
Some of the policy and legal questions these lawsuits raise include:
Where Policymakers Play a Role
Currently, no federal law addresses these issues specifically in the context of generative AI, and how current laws apply to generative AI, or even AI more broadly, is still up for debate. However, the U.S. has taken positive steps to address the responsible development of AI: the National Institute of Standards and Technology established a voluntary AI Risk Management Framework (RMF) in January 2023, which provides guidance for companies and emphasizes addressing characteristics of trustworthy AI, such as accountability and transparency. According to the Framework, “Maintaining the provenance of training data and supporting the attribution of the AI system’s decisions to subsets of training data can assist with both transparency and accountability. Training data may also be subject to copyright and should follow applicable intellectual property rights laws.” Bipartisan lawmakers introduced bills in both chambers that seek to codify the principles developed in the AI RMF for the use of AI in federal agencies and private businesses they engage with. Further, the White House signed an Executive Order on October 30, tasking several federal agencies to create standards and guidance to ensure the safety and security of large AI models, including conducting privacy impact assessments, performing safety tests, and watermarking AI-generated content.
The Federal Trade Commission (FTC) also opened an investigation earlier this year into OpenAI for potential infringement of consumer protection laws. The investigation centers around whether personal information used in training the LLMs that power these chatbots led to “unfair or deceptive privacy or data security practices” or “unfair or deceptive practices relating to risks of harm to consumers, including reputational harm” under the FTC Act. In a New York Times op-ed, FTC Chair Lina Khan said, “These tools can also be trained on private emails, chats and sensitive data, ultimately exposing personal details and violating user privacy. Existing laws prohibiting discrimination will apply, as will existing authorities proscribing exploitative collection or use of personal data.” The agency’s investigation and engagement with the creative industry indicates the FTC’s intentions to enforce existing laws and how they might apply to generative AI. However, there is no precedent for this enforcement, and there is opposition to the FTC’s use of its authority in this capacity.
This year, Congress held several hearings addressing these emerging issues around training generative AI models. Training data and personal privacy were major points of concern during two hearings convened by Sens. Richard Blumenthal (D-CT) and Josh Hawley (R-MO), Chair and Ranking Member of the Senate Judiciary Subcommittee on Privacy, Technology, and the Law. Congress introduced several AI-related bills and, notably, a proposed SAFE Innovation Framework promoting transparency.
There is also bipartisan support in Congress to pass comprehensive consumer data privacy legislation, and lawmakers on both sides of the aisle have advocated for a privacy law that will address many concerns they have with AI inputs. While several AI-related bills seek to address the outputs from generative AI systems, few directly address the training data in ways that a privacy law would. Depending on what policy measures are included in a future privacy bill, it could significantly impact what personal data can or cannot be used in AI systems and how algorithms are trained, designed, and evaluated. Global data protection laws imply protections around data used to train generative AI. For example, the European Parliament’s study on the European Union’s privacy law, the GDPR, determined that “fundamental data protection principles – especially purpose limitation and minimisation – should be interpreted in such a way that they do not exclude the use of personal data for machine-learning purposes.” Whether the U.S. will take a similarly precautionary approach is still being considered. However, as discussed in a recent Senate Energy and Commerce Subcommittee on Innovation, Data, and Commerce hearing, policymakers on both sides of the aisle believe that data privacy and accountability legislation could ensure trust, responsible use, and innovation in AI.
As these issues play out in the courts, policymakers also look to establish governance frameworks that balance AI innovation with the ethical, legal, and societal implications underlying many of these concerns. Regulations aimed at addressing technology are not made in a vacuum, and any policy changes regarding AI must consider the implications to other corresponding areas, such as the effects on users’ data privacy or how AI is used in moderating online content. As technology progresses, so too does the urgency to address the multifaceted issues at the intersection of progressing innovation and safeguarding individuals’ rights and privacy.
Ongoing communication between technology experts, legal professionals, advocates, and policymakers is critical to understanding how best to govern continually evolving AI technology and online ecosystems. A myriad of voices and considerations are essential in helping shape comprehensive policy frameworks that will address generative AI’s challenges and safeguard against future risks.
Support Research Like This
With your support, BPC can continue to fund important research like this by combining the best ideas from both parties to promote health, security, and opportunity for all Americans.Give Now