On 8th March 2024, the Financial Conduct Authority ("FCA") published its report on the use of synthetic data in financial services, following on from its earlier Call for Input and Feedback Statement. Synthetic data is a privacy-enhancing technology (or "PET") which can be used to help deal with the challenges associated with the sharing of sensitive data such as financial or personal data. The use of synthetic data is expected to grow, with the ICO issuing guidance on the use of various PETs last year which discussed synthetic data.
Synthetic data is also considered useful when training tools utilising artificial intelligence on the basis that it can create "statistically realistic but artificial data that can be used to create advanced modelling techniques and train AI models without compromising individual privacy or data protection laws."
The FCA report was authored by the Synthetic Data Expert Group, a sub-group of the FCA Innovation Advisory Group, and focuses on 3 key themes across the data lifecycle. The report gives examples of use cases, applying these themes to synthetic data in financial services.
- Data augmentation and bias mitigation system: the transformation of data to expand and/or reduce the inherent bias associated with the underlying data for model generation.
The use cases include using synthetic data for transaction sequences in fraud detection machine learning models. Of note, the recently published FCA Business Plan for 2024/25 highlights that the FCA will "continue to develop… use of Artificial Intelligence (AI) to help prevent fraud and scams…" as part of a focus on protecting consumers.
The report highlights that the use of synthetic data in this context is beneficial, especially when augmenting (not replacing) original fraudulent transaction data. However, it also notes that there must be a lawful basis for creating synthetic data from the original data which can itself be problematic.
The generation of synthetic data to mitigate selection bias in credit scoring training data is also identified as a use case.
- Testing and model validation: the generation of synthetic data to rigorously test the robustness of AI and machine learning systems and validate their purpose under diverse scenarios.
The use cases include generating synthetic transaction data labels and enhancing training sets and transactional data resembling consumer patterns and behaviours for the purpose of open banking model testing.
The creation of synthetic data relating to individual banks and fraud typologies can also be used to complement real data to assist in preventing Authorised Push Payment (APP) fraud. With reference to the FCA Business Plan, further developments can be expected around the prevention of APP fraud, which is identified as one of the key FCA target outcomes for 2024/25.
- Internal and external data sharing for fraud controls: the responsible sharing of synthetic data and associated models within an organisation (internal) and/or support external facing financial services initiatives.
The use cases include shared common data sets for research into societal changes, such as improving cross-border financial crime prevention and pandemic responses. The use of shared cross-organisational and international transaction data to develop machine learning models which identify illicit payment patterns will support and increase the effectiveness of anti-money laundering controls.
The report emphasises that synthetic data presents a valuable resource for the financial services sector, being capable of contributing to a number of applications highlighted in the use cases. Using them as a reference point, there will be a continuing need for ensuring regulatory and legal clarity, and evaluating whether the use of synthetic data is the best approach, whether alone or in conjunction with other privacy-enhancing technologies.
There will be ethical considerations for those financial services firms looking to use synthetic data, with the report suggesting that internal governance processes supporting responsible and legal uses will be necessary, along with internal guidelines.
Looking to the future, the report concludes that synthetic data presents a potential solution to data scarcity and quality issues, but questions remain as to when it is ethically permissible to use it. The Synthetic Data Expert Group will continue to encourage the development and deployment of synthetic data in financial services.
The FCA report on Using Synthetic Data in Financial Services can be found here.