ICO issues second consultation on Generative AI and Data Protection

By Charlotte Halford & Stuart Hunt

Published 08 March 2024

Overview

Delivering the keynote speech at the IAPP Data Protection Intensive, John Edwards noted the AI and technology focus of the agenda. Despite covering a wide range of topics, he highlighted that the 'biggest question on his desk' was that of artificial intelligence.

The Information Commissioner noted despite the numerous questions raised by the development of generative AI models, there is a key issue which must be considered above all; how those models must be developed and used and in a way which complies with the principles of the UK GDPR.

Last month we discussed the launch of the ICO consultation series on the application of UK GDPR to the development and use of generative AI models. The first call for evidence, which closed on 1 March 2024, covered the lawful basis for training generative AI models on web-scraped data^[1].

The second call for evidence which launched on 15 January focuses on how the principle of purpose limitation should be applied at different stages in the generative AI lifecycle^[2] and we explore this second chapter in more detail below.

The second chapter

The ICO makes clear that although generative AI models will have open-ended ambitions, developers still have to give consideration to purpose limitation requirements. Developers will need to demonstrate they can:

Set out sufficiently specific, explicit and clear purposes of each different stage of the lifecycle; and
Explain what personal data is processed in each stage, and why it is needed to meet the stated purpose.

Whilst analysing the application of the purpose limitation principle to generative AI models, the ICO has concentrated on the issues of compatibility of reusing training data, one model with many purposes, and defining a purpose.

The compatibility of reusing training data

Due to the expense and complexity in collating training data, developers may choose to utilise existing training datasets many times. If the data is being reused to train a different generative AI model, then the developer must consider if the purpose for training the second model is compatible with the original data collection purpose. If it is not compatible for a new separate purpose is necessary.

The reasonable expectation of those individuals whose data is being re-used will be important. Any subsequent compatibility assessment will be easier where the developer has a direct relationship with the individual(s) whose data is being re-used. Where such direct contact is not possible, then safeguards such as anonymisation, public messaging and prominent privacy information should help mitigate this risk.

One model, many purposes

A generative AI model can power a series of different applications such as chatbots, image generation or virtual assistants, and development of the AI model following deployment in the form of an application can occur in various scenarios.

The ICO makes clear that it considers that developing the AI model, and any subsequent development of applications based on that model, are different purposes under data protection law.

Defining a purpose

The purpose must be detailed and specific enough so that any relevant parties are able to clearly understand how and why the personal data is used. Those parties include the organisation developing the model, the people whose data is used to train the model/during the deployment and the ICO.

Relying on a broad purpose will pose difficulties. The example given by the ICO is 'processing data for the purpose of developing a generative AI model'. This wording would struggle to convey the specific processing activities the purpose covers, or allow the developer to demonstrate why particular types of personal data is needed or how any legitimate interests balancing test is passed.

The early stages of the generative AI lifecycle (such as initial data collection) are likely to be more challenging to define than those closer to the deployment end. However, defining a purpose at these initial stages will necessitate a consideration of what types of deployments for the model could follow and its functionality. By contrast, those developing an application based on the model may find it easier to specify in more detail the purpose for that processing.

Next steps

The consultation concludes with an echo of points previously made by the Information Commissioner. Consumers must be able to trust in AI, and companies will be able to maintain this trust by providing appropriate consideration to the development of AI models and subsequent applications based on the model, being clear on the types of data used in each case.

In his IAPP speech, the Information Commissioner highlighted that future chapters of the consultation series would set out the ICO's expectations as regards complying with the accuracy principle, and accountability or controllership across the supply chain.

In the meantime, the call for evidence as regards purpose limitation will close at 5pm on 12 April 2024.

[1] https://ico.org.uk/about-the-ico/what-we-do/our-work-on-artificial-intelligence/generative-ai-first-call-for-evidence/

[2] https://ico.org.uk/about-the-ico/ico-and-stakeholder-consultations/ico-consultation-series-on-generative-ai-and-data-protection/