Interweaving sets of public information can reveal powerful patterns and insights. But can it be done without risking people’s privacy?
When datasets are owned by different organisations, hosted at different physical sites and subject to different access and usage regulations, it can make the process of combining them to draw new research insights both complicated and protracted. Professor Chris Dibben and his Scottish Centre for Administrative Data Research (SCADR) team, based at the University's Bayes Centre, have drawn on fifteen years of research to develop some ingenious solutions.
The SCADR team has established six research programmes to enable researchers to access data in novel ways, and approaches that link sensitive personal information for professional training and evidence-based policy making while preserving privacy. These programmes include establishing the legal concept of functional anonymization, which means determining whether data is personal or non-personal; Safepods, a physical solution for digital security; and Synthpop, software and statistical methods for producing synthetic population data.
A safe space
To create a safe physical environment for accessing sensitive administrative data Professor Dibben’s team came up with Safepods. These secure cubicles operate similarly to how countries use embassies as diplomatic spaces in other nations. They allow a data centre to have control over a space, or a ‘pod’, in another location, and enable data controllers to be confident that legitimate end users are doing only what they are expected to do.
Safeguards including written agreements and a ban on phones in the pods ensures that data is not brought into or out of datasets and prevents individuals from being re-identified. The pods mean that users do not need to travel as far to safely access the data they need, helping to address inequalities in access and use of data in different parts of the country.
Another key part of SCADR’s work is ‘Synthpop’ software, which mimics the broad population characteristics from a real dataset in a way that preserves the ability to draw accurate conclusions without disclosing real data to researchers. “It’s very useful in training or teaching situations,” says Professor Dibben. “For example, it avoids having to ask fifty people to sign data sharing agreement for a training seminar.”
The research that eventually led to the creation of the SCADR began in the mid-2000s with funding from Economic and Social Research Council. Since then, UK government agencies, the Scottish government, police forces and the NHS have all made use of the work done by the SCADR - including in the response to Covid-19. Data innovation is now a major strategic focus for the University of Edinburgh, since it launched the Data-Driven Innovation initiative in 2018 as part of the City Region Deal, with its ambition to position Edinburgh as the data capital of Europe.
For now, data sharing innovation in the UK is largely confined to public institutions. “My centres manage a lot of data – all from public institutions,” says Prof Dibben. “There is no data from private organisations at the moment, however this is something that we may explore in the future. Importantly we only, legally, do ‘public benefit research’ and we cannot, therefore, support work that is purely in support of a for-profit organisation.”
Professor Dibben and his team want to support bringing data together for public benefit, and to explore how combining datasets could provide valuable insights into a wide range of social issues. By harnessing the potential of multiple datasets and enabling safe access, the SCADR team are creating the ideal conditions for learning opportunities and positive change.
Header image: seventyfour/stock.adobe.com