By Hans Allnut & Astrid Hardy

|

Published 24 July 2023

Overview

Generative AI tools such as ChatGPT have been in the headlines since 2022, with many users (including children) using the tools to generate content with little thought as to the data protection risks associated with its use. Last month, we reviewed the various actions being undertaken by European data protection authorities in an effort to ensure that OpenAI, the creator of ChatGPT, is using the personal data obtained via the platform appropriately. Perhaps unsurprisingly, there are now updates from the United States as the regulatory and litigation backlash against OpenAI takes on a wider international outlook.

There is a key theme to the developments in the US which concerns the huge amount of personal data which is being used (allegedly scraped from the web) to train OpenAI's products. The investigation and class actions filed highlight the moral, legal and ethical concerns regarding the development of generative AI products.

The past month has seen the commencement of an investigation into OpenAI by the Federal Trade Commission, and further understanding of how the litigation landscape for artificial intelligence technologies is likely to progress, with the first AI focused class action filed. These increasing regulatory pressures on OpenAI and class action lawsuits are indicative of the legal backdrop against which AI platforms will find themselves operating. We summarise the key developments below:

Federal Trade Commission investigation

It has recently been reported by the Washington Post that the US Federal Trade Commission has commenced an investigation into OpenAI, demanding substantial detail on how OpenAI addresses risk related to its AI models.

The subject of the investigation is whether OpenAI has "(1) engaged in unfair or deceptive privacy or data security practices or (2) engaged in unfair or deceptive practices relating to risks of harm to consumers, including reputational harm, in violation of Section 5 of the FTC Act [which relates to unfair or deceptive acts or practices]".

A copy of the reported FTC Civil Investigative Demand can be found here.

Class actions

We are now seeing a series of class actions being commenced in the United States against OpenAI and other companies involved in the development of AI systems, which largely consider whether data and images have been improperly obtained and then used as part of training datasets for the AI platforms.

Northern District of California class action

A class action has been filed in California against Open AI and Microsoft including various alleged complaints. Microsoft is also a defendant in the class action, since it integrated OpenAI's technologies in some of the software and services it provides (Azure OpenAI Service, Bing, Microsoft Teams). The plaintiffs (claimants), have asked to remain anonymous, are claiming USD 3 billion. The action is one of the first AI related class actions which is not focused on intellectual property infringement. The complaints include violation of unfair competition, privacy, consumer fraud, and deceptive business practices legislation across Illinois and California.

The complaint, a copy of the 157 page complaint can be found here, alleges ongoing harms and threats posed by the products which are the subject of the complaint (ChatGPT-3.5, ChatGPT-4.0, Dall-E, and Vall-E). These harms include privacy violations, misinformation campaigns, malware creation and autonomous weaponry. In respect of the harms actioned upon the members of the class, it is alleged that the business model of OpenAI is based on theft.

It is argued that OpenAI ignored established protocols for the purchase and use of personal information which would be used to train the AI, and instead "systematically scraped 300 billion words from the internet, “books, articles, websites and posts – including personal information obtained without consent” [paragraph 146].

It is also alleged that non-users of ChatGPT had their "personal information scraped long before OpenAI’s applications were available to the public, and certainly before they could have registered as a ChatGPT user" [paragraph 159]. The personal data includes accounts, names, contact information, means of payment to name a few examples. The class action highlights the privacy concerns with the potential exposure of users' and non-users' personal information.

Authors class action

Class actions were also filed against Meta and OpenAI in San Francisco earlier this month, by the authors Sarah Silverman, Richard Kadley and Christopher Golden against Open AI and Meta. The complaint against Meta is directed against the Large Language Model Meta AI (LLaMA) product, which as of yet is not publicly available.

It is alleged that OpenAI has been training ChatGPT using datasets "from copyrighted works - including books written by [the author Plaintiffs]… without consent, without credit, and without compensation."  The action against Meta makes similar allegations.

Getty UK/US action against Stability AI

In the UK, Getty Images is pursuing a claim against an AI image generator, Stability UK, over the unlawful use of copyrighted photographs in order to train the image generation tool. The representative action alleges that at least 12 million links to Getty content had formed part of the training.  Earlier this year, similar litigation against Stability's US operations was filed in Delaware by Getty, and a class action complaint has also been filed in San Francisco by artists who also alleged their work has been used to train the Stability AI tool.

Implications for companies in the UK

Dependent on the outcomes in the US there could be real implications on companies globally should there be a temporary freeze on access granted (which is what the plaintiffs ask for in the California class action), which would impact the continued availability of ChatGPT and other OpenAI products worldwide. Alternatively, we could see requests for tighter security and controls on the use of personal data which would be welcome, but could restrict the speed of development on generative AI tools generally.  For example, it is alleged that OpenAI is using personal data which has been collected via web-scraping (without users' consent), should future consent be required, then it follows that this will limit the speed of development of generative AI tools moving forward.

Conclusion

Any decisions flowing from the US, will undoubtedly alter the data practices used by OpenAI to develop its generative AI tools. It is likely that any outcome from the US will have a knock-on effect with regulators and organisations globally.

We will continue to report on developments in this area, in particular steps taken by the Courts and whether this aligns with data protection regulators around the world. What remains to be seen is whether the Courts will agree with the regulators' concerns on the associated data protection and privacy risks in both the development and use of these AI tools.

Authors