There are various ways to link data. In each case, the records are de-identified, so each person in the data is represented by a randomly generated serial number.
One very important aspect of the ADRN's work was to develop high standards for sharing, linking and matching records securely and consistently.
Linking different years
Some data collections are based on longitudinal studies – where the same people can be in annual waves of a series of data for many years. Linking different years within one of these data collections creates a record, showing how a person’s life has changed over time.
Linking different sources
The ADRN couldlink different administrative data from different sources, for example:
- linking benefits data and earnings data to health data to investigate the impact of poverty on health
- linking education data to criminal justice data to understand more about how people can be helped out of criminality
It’s also possible to link administrative data with other non-administrative data, such as:
- longitudinal survey data – studying the same people over time
- cross-sectional data – a sample, or cross section, of a population at one time, or without considering differences in time
- other information that puts it into context. For example, linking to information about a neighbourhood using the Index of Multiple Deprivation or Index of Deprivation Affecting Children Index – or linking to information about schools and universities
The opportunities for research and evaluating policy are huge, but in the UK few examples exist. Thanks to the steps undertaken by the ADRN, this can soon change.
The linkage process
- Step 1. After a researcher has been trained, and their research proposal approved, the Network will negotiate with the data custodians to release the collections of data which are relevant to the project.
- Step 2. When this has been agreed, the data custodians (government departments which gather and hold the data) give each record a unique reference number. They then separate the names, dates of birth and other information that can directly identify people from the data collection.
- Step 3a. The data custodian then sends the data - with unique reference numbers but no identifying information - to one of the Administrative Data Research Centres.
- Step 3b. At the same time, the directly identifying personal information is sent to a trusted third party with the unique reference number for each record – but not the research data.
- Step 4. The trusted third party matches the information using the unique reference numbers and the identifying information. They then destroy the directly identifying personal information, leaving only the matched unique reference numbers.
- Step 5. An ‘index key’ shows which reference numbers relate to the same person in the separate data collections. The trusted third party sends the index key to the Administrative Data Research Centre.
- Step 6. The Administrative Data Research Centre uses the index key to link the data collections together. They then delete the index key and reference numbers before finally allowing the researcher to see the linked data.
Using this system keeps directly identifying personal information and research data separate:
- Trusted third parties only see the identifying information and the reference numbers. They never see anyone’s research data.
- Network staff only see the research data and the index key, never personal identifying information.
- Researchers only see the data they have requested – not the index keys or the directly identifying personal information – and only in secure facilities