1 Overview
2 Sanitisation Cases
- 2.1 Internal Source Information on Merged Resources
- 2.2 Patient
  - 2.2.1 Identifier Sanitisation
    - 2.2.1.1 Managing Organization Array Information
- 2.3 Consent Privacy Flags

Overview

Conversely to data enrichment, data retrieved from the aggregated endpoint is often subject to data sanitisation. This involves the removal of data which is persisted to the database from the resource(s) which are ultimately returned to the caller.

Sanitisation is typically performed for information governance reasons, to remove internal data elements which are beyond the authorization scope of the of the request.

Sanitisation Cases

Internal Source Information on Merged Resources

Resources are classed as merged when multiple representations from different upstreams combine into a single representation in the aggregated endpoint. The typical example for this is Patient. When a resource is merged, individual elements are tagged with source information to indicate where they came from, both for audit and functional reasons. The initial source of the merged resource itself is also recorded.

This source information has the potential to reveal details about where a Patient has been cared for which are not relevant to the request for the Patient resource itself. This information is therefore removed.

Patient

In additional to the removal of internal source information, Patient is also subject to additional sanitisation processes.

Identifier Sanitisation

While national identifiers, such as NHS number, are properties of the Patient themselves, many sub-national identifiers are contextualised to a single care context, and therefore inherently grant information about where a Patient is being treated. An example would be a hospital PAS number, or a sexual health clinic number.

Although these identifiers are appreciated to be important elements of a Patient’s identification process, they represent an vector for the leakage of information that the caller of the Patient resource themselves may not have appropriately authorization to access. There they are subject to the following rules:

The Patient, if retrieving their own representation, may always see all of their identifiers.
A different Patient, if retrieving the Patient as an authorized carer or third party, will not see any identifiers, including national identifiers.
A Practitioner or Organization, if retrieving the Patient as an authorized third party, will see only national identifiers, and PKB internal identifiers such as the public_id

Managing Organization Array Information

The managingOrganization array is described in more detail here: Aggregated Data Enrichment | Patient managingOrganization array

This is subject to sanitisation for the same reason as other elements: it provides information about the Patient’s sources of care. This element is only available when the Patient is retrieving their own record.

Consent Privacy Flags

PKB Consent resources generated through the PHR system may have up to four distinct privacy labels denoting the type of access which has been granted. Upon retrieval of Consent resource through the aggregated API, any privacy labels which have not been granted to the caller will be removed from the Consent resource. This is done so details of the Patient’s care, e.g. that they have granted sexual health permissions to a particular Organization, are available to the caller.

As with other resources, this sanitisation does not occur when the Patient retrieves their own data.

Aggregated Data Sanitisation