Privacy and security
Personal data: Personal data are any information about an identified or identifiable natural living person. A natural person is considered to be identifiable if he or she can be identified directly or indirectly. (See Personal data)
Special categories of personal data: The following personal data is considered ‘sensitive’ and is subject to specific processing conditions and protection: personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs; trade-union membership; genetic data, biometric data processed solely to identify a human being; health-related data; data concerning a person’s sex life or sexual orientation. (See Sensitive Personal Data)
Pseudonymised data and pseudonyms: (see GDPR) Pseudonymised personal data (referred to as 'coded data' in previous Privacy legislation) are personal data (whether sensitive or not) that can only be associated with an identified or identifiable person by means of a non-public (secret) key (the pseudonym). Pseudonymisation means the processing of Personal  data in such a manner that the Personal data can no longer be attributed to a specific natural living person without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the Personal data are not attributed to an identified or identifiable natural person. Pseudonymised personal data are still personal data protected by the GDPR. (See Pseudonymisation)
Anonymous data: With anonymised personal data, the possibilities for identification have been 'irreversibly' removed by means of a processing technique. This is a high bar to clear and claims of achieving anonymisation should be under scrutiny. Data that can be traced back to the original individuals with reasonable effort are not anonymous data, but remain personal (pseudonymised) data and therefore fall under the GDPR. For this reason, it may be difficult to completely anonymise certain types of research data (for example: qualitative data, large data sets with a wide range of personal data, etc.). If a system or app anonymises the data itself, the handling or anonymisation itself still fall under the scope of the GDPR. A key note is that even if a system succeeds in processing anonymous data, it is still important to keep in mind the ethical aspects of collecting and processing said anonymous data.
Privacy is obviously at the center of the debate, and for good reasons. The reach and the type of data useful for epidemiologic surveillance (which is somewhat unfortunate that a legitimate concept from epidemiology carries a name with such a negative connotation) is intrusive and carries a risk of abuse for mass-surveillance purposes or other forms of so-called function creep.
In the Europe Union, several human rights regulations guarantee and protect the right to privacy. In particular, the General Data Protection Regulation (‘GDPR’) regulates the right to privacy and to data protection of citizens. Any solution that would infringe the GDPR would be outright illegal. So GDPR compliance is a minimal requirement.
This already enforces strong and clear requirements for any system being built, ensuring the principles around lawfulness, fairness and transparency, purpose limitation, data minimisation, accuracy, storage limitation, integrity and confidentiality and, most importantly, accountability.
When designing solutions, the GDPR explicitly requires data protection by design and by default: this forces compliance of the principles relating to processing of personal data at an early stage.
“Only personal data which are necessary for each specific purpose of the processing are processed. That obligation applies to the amount of personal data collected, the extent of their processing, the period of their storage and their accessibility (GDPR, art. 25)”
This does not mean the GDPR is the absolute yardstick for the systems covered by this publication and the processing of personal data therein. Although they are legitimate within the GDPR, some practices might be rejected by the public opinion or by activists on moral or ethical grounds. The sensitivity to those grounds might differ, hence creating dissonance in the debate.
In the light of the above-mentioned intrusive nature of the data required, GDPR compliance should be augmented with agreements on how to process personal data within the GDPR frame with the aim to foster public trust in the solution and avoid function creep.
Therefore, the Appia+ consortium proposes a charter (see [Charter]) to augment the minimal requirements among following lines:
- Adhere to the joint civil society statement: “States use of digital surveillance technologies to fight pandemic must respect human rights” (see Joint Civil Society Statement)
- Solution is of a temporary nature and aligned with the evolution of the pandemic
- Purposes are limited to responding to the pandemic and phasing out the restrictive measures
- Principle of least privilege
- The way back to normality is known
- Full Transparency (code, data, algorithms)
- Granular user consent, regardless of the legal base
- No mandatory use by citizens
- No access to data except for public health authorities (subject to consent)
- Strictly limited retention period
- No support for law and policy enforcement
- No commercial exploitation
In the context of this kind of solutions it might be necessary to create legislation in order to enforce those requirements.
The obligation under the GDPR to protect data by design and by default can yield a spectrum of data protection solutions. We identify two models for approaching this under the augmented requirements: the trust-based model and the zero-trust model. The specific requirements for properly implementing data protection differs depending on the model, so we shall identify the specific requirements in each discussed model.
Data protection requirements in a trust-based model
The trust-based model looks for the most privacy-preserving way to gather the minimal required data while keeping them out of the hands of those who could abuse them. The model involves a party (called a Trusted Party) that can be trusted with the data and trusted not to pursue any function creep. Under data protection by design and by default this model is to be considered when there is value in transferring and/or centralising the data or when the data originates in a system not owned by the data subject. Examples of this model are cloud-based services but also tax records, medical records, judiciary systems, ...
The trust-based model has a number of advantages:
- It is easier to combine data through a common pseudonym (taking care not to re-personalise the data)
- It is technically easier to adapt the algorithms processing the data, and reprocess the data, as new versions of the algorithm don’t have to be distributed at each change
- The aggregation of the data can be done after the collection and can be customized towards the needs and privileges of the data processor and downstream consumers. Aggregation of data can be made variable so as to keep maximum achievable detail without re-personalisation.
The trust-based model also has a number of disadvantages:
- Some of the advantages are double-edged, as it is not impossible to change the purpose of the processing
- Prevention of function creep is based on trust in the custodian and it not changing its mind or being repurposed
- Centralised data might be a more interesting target for attacks and has a higher attack surface than decentralised data
Safeguards to foresee in such a trust-based model are:
- Proper governance and oversight on the Trusted Party to strictly limit the processing to the stated purpose
- Vetting of the personnel having access to the data
- Applying the Principle of Least Privilege among the personnel of the data processor
- Code of conduct for the personnel, actively enforced
- Strict application of encryption in transit and at rest until the latest possible moment before processing
- Reducing the attack surface to a minimum
- Systematically distrusting any party not under direct control of the Trusted Party (like infrastructure providers)
- Transparency and scrutiny on the security measures
- Pseudonymisation without storage of re-personalisation data
- Shortest possible retention period, preferably rolling
- Applying data minimisation, limit the data centralised
These safeguards are considered requirements for any system claiming to be based on this model.
The zero-trust model looks for ways not to have to trust anyone and defines this as a constraint for any solution. This kind of model is often coined as ‘decentral’ because the personal data (non-anonymous) is kept on devices controlled by the user. Only fully anonymised data are centralised. The reasoning behind this is that when data is not centralized it is impossible to use it for other purposes than initially agreed upon. Also, this reduces the attack surface.
Note: A zero-trust model is an application of the privacy by design principle, but not the sole possible implementation. A trust-based model can also claim to have implemented privacy by design when legitimate use cases cannot be implemented without centralising personal data.
Advantages of the zero-trust model:
- Function creep is impossible
- The surface of attack is more fragmented, making an attack more resource-consuming
Disadvantages of the zero-trust model:
- Some use cases that might be legitimate are excluded, like combination with other data sources (which is not possible in the absence of a pseudonym)
- Development and evolution of algorithms is harder when they are distributed to devices outside the direct control of the data processor
- The availability of the system as a whole depends on the availability of decentral components
- The value of the system relies heavily on the cooperation and the availability of the user
Requirements for a zero-trust model:
- No trust must be expected from any party besides the developer
- The solution must only transfer and centralise truly anonymous data
- The anonymous nature and if applicable method for anonymization of the data must be transparent and under scrutiny
- The remaining surface attack must be secured