Open AI: Recent investigations and claims in the United States and the EU

By Hans Allnutt & Astrid Hardy

Published 10 May 2024

Overview

Generative AI tools such as OpenAI have been in the headlines since 2022, with many users (including children) using the tools to generate content with little thought as to the privacy risks associated with its use. Last year, we reviewed the various actions being undertaken by European data protection authorities in an effort to ensure that OpenAI, the creator of ChatGPT, uses personal data obtained via the platform appropriately. We also provided an update on the United States as the regulatory and litigation backlash against OpenAI takes on a wider international outlook.

Perhaps unsurprisingly, class actions are still being filed in the US and there is a growing interest by regulators in the European Union with the regulation of generative AI tools. We summarise the recent key developments below:

Class Actions

We are continuing to see a series of class actions being commenced in the United States against OpenAI and other companies involved in the development of AI systems, which largely consider whether data and images have been improperly obtained and then used as part of training datasets for the AI platforms.

Copyright Class Actions

In December 2023, the New York Times commenced legal proceedings against OpenAI and Microsoft for using articles which sat behind their paywall to train and develop its AI models without the New York Times' agreement, infringing its copyright.

On 1 May 2024, a further eight US newspapers followed the New York Times in filing legal proceedings against OpenAI and Microsoft. Those newspapers are all owned by the Alden Global Capital Group (including The New York Daily News, The Chicago Tribune, The Orlando Sentinel, The Sun Sentinel of Florida, San Jose Mercury News, The Denver Post, The Orange County Register, and The St. Paul Pioneer Press).

The main allegation is copyright infringement in that OpenAI and Microsoft used copyrighted articles without permission to train their AI tools. A further allegation is that OpenAI falsely credited the publications for inaccurate and misleading reporting thereby "tarnishing the newspapers' reputations and spreading dangerous information". This class action provides an insight into the ever changing relationship between advances in technology and the traditional media.

Many of these class actions are just starting to gather pace. There have been numerous class actions in the US over the past year for copyright infringement not only against OpenAI, for example Getty v Stability AI, Anderson v Stability AI where artists claim that Stability AI created a software programme that downloaded millions of copyrighted images to train its AI model. It's a developing area that we will continue to monitor.

However, in the UK the Financial Times has recently entered into a partnership where OpenAI can use its content to train and develop its AI model albeit on the publisher's archived content. In doing so, many have commented that the publisher is trying to control the content that the AI model can use.

Privacy Class Actions

OpenAI and other generative AI companies are also facing claims for alleged privacy breaches. The allegations in the privacy class actions usually revolve around the collection of publicly available data (i.e., web-scraping) without users' consent.

A recent class action was filed in the Northern District of Carolina in February 2024, AS v Open AI, by plaintiffs/claimants who allege that Open AI "stole private and public information from millions of users" by collecting information from the internet to develop and train its AI tools. The class action highlights the privacy concerns with the potential exposure of users' and non-users' personal information. It is one of many that have been filed in the United States and will be one to watch.

Regulatory investigations

Italy launches investigation into OpenAI tool Sora over data protection concerns

Earlier this year, OpenAI introduced a new generative AI offering: Sora, which is a text to video generator. OpenAI promises users the ability to generate up to one minute long videos from short prompts. It is currently in its testing phase and not available to the public yet. It is currently unknown if Sora will only use the images / videos of data subjects it collects to create videos, or if there will be any use of biometric data. Its privacy policy is currently unclear on the biometric data point. A few weeks after its announcement, the Italian data protection authority launched and investigation into the tool. At present, it is the first and only data protection authority to launch an investigation. As we have seen previously, the Italian Garante is usually the pioneer in Europe for launching investigations into novel AI tools.

The Italian Garante's concerns are in respect of the nature of the data collected and used in training Sora. Further, it is interested in whether the data includes people's personal data, and in particular any special category data, and how it will comply with the EU GDPR when launched. It is also concerned with how Sora will comply with obtaining user consent and communicate its data processing activities.

OpenAI was given 20 days to respond to the Italian Garante's investigation, in order to allow the Italian Garante to assess the potential implications that the tool might have on the processing of personal data within Italy. There has been no public statement from the Italian Garante since and it is likely that talks with OpenAI are ongoing. The outcome of the investigation is eagerly anticipated.

NOYB files complaint to Austrian Data Protection authority against OpenAI's ChatGPT

NOYB, the Austrian non-profit organisation founded by Max Schrems has filed a complaint to the Austrian Data Protection Authority (DSB) as it alleges that OpenAI's tool ChatGPT 'provides false information about people, and OpenAI cannot correct it'. It has asked DSB to investigate the processing of user's data.

NOYB alleges that ChatGPT's hallucinations are making up information about individuals which breaches the EU GDPR on the accuracy of information provided (article 5 EU GDPR) and that it is unable to correct inaccurate information (articles 12(3) and 15(1)-(3). The New York Times has previously reported that "chatbots invent information at least 3 percent of the time – and as high as 27 percent". The complaint was triggered as ChatGPT was asked to provide Max Schrems' date of birth and provided three incorrect dates.

The complaint may be transferred to the Irish Data Protection Authority as OpenAI's EU base is in Ireland, but for now the complaint is with the Austrian DBS. This outcome will be one to watch.

Implications for companies in the UK

Dependent on the outcomes in the US and by European Regulators there could be real implications on companies globally, which could impact the continued availability of ChatGPT and other OpenAI products worldwide. Alternatively, we could see requests for tighter security and controls on the use of personal data which would be welcome, but could restrict the speed of development on generative AI tools generally. For example, it is alleged that OpenAI is using personal data which has been collected via web-scraping (without users' consent), should future consent be required, then it follows that this will limit the speed of development of generative AI tools moving forward.

We will continue to report on developments in this area, in particular steps taken by the Courts and whether this aligns with data protection regulators around the world. What remains to be seen is whether the Courts will agree with the regulators' concerns on the associated data protection and privacy risks in both the development and use of these AI tools.