Current data governance practice in hospitals allows data sharing if certain conditions and criteria are fulfilled. For most of the Swiss research projects this includes a verification of the:

Project plan
Patients’ consent
Ethical approval
Legal agreement among project parties
Technical security measures
De-identification of data

The de-identification of health-related data (together with other conditions) postulates an essential approach to protect patient privacy and is a prerequisite for data sharing among a broader research community. Even though there are international guidelines available concerning the de-identification of data, there is no guidance for the de-identification of health-related data specifically taking into account the Swiss law and data protection regulations. The U.S.-Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule establishes national standards to protect individuals' medical records and other personal health information, and also Swiss research projects often refer to the HIPAA when documenting the de-identification process. However, the HIPAA Privacy Rule cannot be executed in its form in Switzerland.

SPHN has therefore launched the Data De-identification Project to develop Swiss recommendations for de-identifying health-related data making data sharable in compliance with Swiss legal requirements and data protection regulations. The recommendations were elaborated by the Data De-identification Project Task Force [1] in collaboration with additional Swiss university hospital representatives and legal opinion leaders, pooling experiences and knowledge regarding responsible data sharing.

The following project goals were defined:

Obtaining a legal opinion to inquire the data de-identification approach according to legal requirements in Switzerland.
Development of hands-on guidance for de-identifying data considering i) the overall project specifications following a quantitative and qualitative risk-based approach and ii) the pragmatic mitigation of re-identification risk by producing a set of de-identification rules.
Development of a template to document and justify the approach of de-identification.

Feasibility aspects for the implementation of recommended de-identification approaches are considered, but implementation per se belongs to the responsibility of each institution.

Outcome summary

A guidance document “Data de-identification – phased approach” was created, which outlines the legal context of de-identifying data in compliance with Swiss law and elaborates in three phases the de-identification process following a risk-based procedure. The phased approach concept aims to not only cover the mitigation of risk for re-identification but also considers the management of a de-identification process, which foresees the verification and periodic review of the performed risk assessment. Hands-on support for defining project specific de-identification and re-identification risk is provided with the “Template use case evaluation and risk assessment” (Appendix A to the guidance document), which is supposed to be completed by the project leader according to the project specifications. Both documents were revised in order to simplify the assessment and reflect applicability validated through use cases. The documents are available here:

[1] Members of the Task Force: Julia Maurer (Swiss Institute of Bioinformatics Personalized Health Informatics SIB), Sabine Österle (SIB), Jan Armida (SIB), Judit Kiss Blind (SAMW), Michaela Egli (SAMW), Jean-Louis Raisaro (CHUV), Katie Kalt (USZ), Marc Vandelaer (wega Informatik AG), Antje Thien (USZ), Fabian Prasser (BHI at Charite, Germany), Bradley Malin (Vanderbilt University, USA) and in collaboration with additional Swiss university hospital representatives.

Swiss legal framework for de-identification of health-related data

To ensure that elaborated de-identification recommendations are in accordance with Swiss legal requirements, the PHI Group has requested an independent legal opinion. The Homburger AG presents in its memorandum (Swiss Legal Framework for De-identification of Health-Related Data) the process of de-identifying personal data and its key elements under the Swiss Data Protection Act (DPA) and the Human Research Act (HRA), and gives evidence on the de-identification requirements to be met. Moreover, it discusses to what extend the two methods provided by the HIPAA – the “Expert Determination” method and the “Safe Harbor” method – meet Swiss legal requirements. The “Expert Determination” method represents a formal determination by a qualified expert. The “Safe Harbor” implies the removal of specified identifiers as well as absence of actual knowledge by the medical professionals that the remaining information could be used alone or in combination with other information to identify the individual.

In summary, the memorandum implies that the simple removal of the direct identifiers does not necessarily result in the data being re-identified only with disproportionate effort, since it does not consider the risk by combination or other remaining risk. Therefore, any rule-based approach will have to be combined with a risk assessment, in order to satisfy Swiss law requirements.

Used terms in the Swiss and international research context

The de-identification process results in coded (pseudonymized) or anonymized data, depending on the method used to be consistent with the specifications of the research project (see figure 1).

Figure 1: De-Identification process results in coded or anonymized data.

For the further use of data, it is crucial to consider the respective legal framework depending on the country where data is processed (Table 1). In this context it needs to be considered that terms may differ in their naming in the applicable legal regulation, but are possibly describing similar categories of data, such as coded and pseudonymized data. On the other hand, coded or anonymized data, may be used in a different sense by the research parties even though the definition of the applicable law appears to be clear. Moreover, it needs to be differentiated between terms describing already de-identified data (coded or anonymized data) or the process of coding (pseudonymization) or anonymization itself, which might be legally defined.

In this context, the European Data Protection Supervisor has recently published the most important misunderstandings related to anonymization, which can be found here.

Table 1: Terms & legal regulations concerning further use of data effective in Switzerland, the European Union & the United States of America.

Data are supposed to be truly anonymized, if re-identification of a person is only possible with a disproportionate effort. Anonymization can include an irreversible masking or deletion The concordance table as depicted in figure 1 should be for example deleted.
Coded or pseudonymized data are de-identified data which are still considered as personal data. The process of coding or pseudonymization is reversible, so that re-identification of the data subject is possible with the according key (concordance table) but is restricted to duly authorized users. Although using coded/pseudonymized data might be accompanied with a higher risk of re-identification, it brings some advantages for the researchers legitimating the use of coded/pseudonymized data only, such as:

Facilitating follow up research
Avoiding loss of value of data that have been anonymized
Informing data subjects or their care providers of reportable, incidental findings

A phased approach concept for the de-identification of data

In accordance with the conclusions of the Homburger AG memorandum and taking into account (international) publications on the requirements for de-identification, the Data De-identification Project Task Force has developed recommendations for a phased de-identification approach.

The aim of the de-identification workflow composed of three phases is to combine both risk-based and rule-based approaches as schematized in Figure 2.

The 1^st phase is dedicated to assessing and mitigating patient re-identification risks. The risks are inherent to both the research project’s control measures (e.g., data storage location, contracts and policies, cohort profile, IT infrastructure and security) and the data set itself (data types and specific variables). As such, this phase seeks to define and subsequently reduce the research project’s risk profile by introducing appropriate control measures in the project’s context and specifying accurate de-identification rules on dataset variables.

The 2^nd phase consists of the implementation of de-identification rules defined during the 1^st phase (e.g., replacement of variable value by a pseudo identifier, suppression of a variable value). It is in the responsibility of the data provider (i.e., individual hospital) to specify the implementation of these rules in detail as they depend on the provider’s internal IT requirements and constraints (i.e., data privacy, information security, etc.).

Since a research project’s lifecycle frequently requires adaptations of data exchanges between the provider and the recipient (e.g., new variables required) or even of the project context (e.g., new processor involved), a 3^rd phase completes the de-identification workflow. This phase is dedicated to a periodic review of the project and of any modification which may require the overall de-identification workflow to be run again (phases 1 and 2). Modifications to be considered should be those inherent to the research project, but also external ones related, for example, to technological or organizational evolutions impairing the initially assessed re-identification risks.
Figure 2: De-identification of health-related data – recommended phased approach. Phase 1 comprises the re-identification risk management assessing and mitigating patients’ re-identification risk. Within phase 2 risk mitigation actions specified in phase 1 are implemented and verified accordingly. Phase 3 describes the periodic review of the risk assessment performed according to project specifications.

Risk assessment of re-identification (Application of SPHN Data De-identification Guidelines) vs. Data Protection Impact Assessment – What is the Difference?

Relation between Data Protection Impact Assessment and SPHN Data De-Identification Guidelines

The questions raised in a Data Protection Impact Assessment and the De-Identification Guidelines are in this sense related and do partially overlap. However, purpose, scope, responsibilities and indication to conduct one of these assessments are different (see below).

Whereas conducting a DPIA will be in most cases in the responsibility of the Data Protection Officers (DPOs), the “SPHN Use case and risk assessment” with its selected de-identification rules used in research-projects will be rather carried out by the project leaders. The “SPHN Use case and risk assessment” is a hands-on tool provided as Excel File for researchers to define project specific settings through answering question by question and selecting de-identification rules in order to mitigate the risk of re-identification of data subjects. It offers researchers a good orientation how to further mitigate the re-identification risk, in case the risk profile calculated on the selected answers is too high. The DPIA is a process designed to help organizations identify and mitigate privacy risks related to personal data processing activities. It contains detailed information about the nature, scope, context, and purposes of the data processing allowing organization to evaluate whether the data processing is necessary and proportionate in relation to its purpose.

Since health-related data is qualified according to the FADP as sensitive personal data (Art. 5 let c FADP), the “SPHN Use case evaluation and risk assessment” can be used as one measure to reduce the risk of re-identification. In particular, the description of the de-identification rules could constitute one important pillar of the measures within the DPIA to mitigate the risks of infringing the data subject’s rights.

If it is more practicable to carry out first the ”SPHN use case evaluation and risk assessment” and then the DPIA or weather a parallel assessment makes more sense depends on the complexity of a project design as well as the individual data governance processes respectively flow of information at the institutions carrying out the assessments.

SPHN Data De-Identification Guidelines

Purpose:

The SPHN De-Identification Guidelines with its “SPHN Use case and risk assessment template” has been developed to identify the remaining risk for re-identification of data subjects considering selected de-identification rules and project specific settings. The guidelines outline the legal context of de-identifying data in compliance with Swiss law and elaborates in three phases following a risk-based procedure. The phased approach concept aims to not only cover the mitigation of risk for re-identification but also considers the management of a de-identification process, which foresees the verification and periodic review of the performed risk assessment. The “SPHN Use case and evaluation template” has to be completed whenever personal data is supposed to be shared with collaborators that are processing de-identified health-related data. It serves as a documentation and summary for regulatory boards that aim to evaluate the project specific settings and de-identification approach that ensure sufficient data privacy and security in compliance with ethical and legal regulations.

Scope:

The Data De-identification guidelines concern the field of health-related research projects, in which health-related personal data and/or biological material from a natural person (data subject) is used for research purposes. The de-identification of health-related data describes the process resulting in pseudonymized (coded) or anonymized data ensuring data privacy and security in compliance with Swiss legislation.

The “SPHN Use case evaluation and risk assessment template” assesses and mitigates patient re-identification risks. The risks are inherent to both the research project’s control measures (e.g., data storage location, contracts and policies, cohort profile, IT infrastructure and security) and the data set itself (data types and specific variables). As such, the template seeks to define and subsequently reduce the research project’s risk profile by introducing appropriate control measures in the project’s context and specifying accurate de-identification rules on dataset variables.

Responsibilities:

The data controller or the organization hosting the personal data is responsible to ensure that there are sufficient safeguards implemented before health related data is shared and used for project purposes. In most cases the “SPHN use case evaluation and risk assessment template” is completed by the project leader in collaboration with IT personnel or legal officers.

Data Protection Impact Assessment (DPIA)

Purpose:

According to the revised Federal Data Protection Act (FADP), a data protection impact assessment (DPIA) must be carried out if processing of personal data (not only health-related data) is likely to result in a high risk of violating a data subject's personality or fundamental rights (Art. 22 para. 1 FADP).

The DPIA not only ensures the early identification of significant project risks, focusing on the probability of their occurrence and, when they are qualified as ‘high’, to the significance of their effects. Rather, the practical benefit of this working tool lies in documenting the source and analysis of systemic and security risks in a comprehensible manner and using suitable measures to reduce them to a level acceptable from a data protection perspective.

By conducting a DPIA and identifying potential privacy risks associated with data processing activities, the responsible organization demonstrates commitment to protect personal data and maintains transparency with data subjects and regulators. The DPIA supports the principle of accountability by documenting the risk assessment processes related to data protection and privacy.