Guidance for de-identification of health-related data in compliance with Swiss legal and data protection regulations
Current data governance practice in hospitals allows data sharing if certain conditions and criteria are fulfilled. For most of the Swiss research projects this includes a verification of the:
- Project plan
- Patients’ consent
- Ethical approval
- Legal agreement among project parties
- Technical security measures
- De-identification of data
The de-identification of health-related data (together with other conditions) postulates an essential approach to protect patient privacy and is a prerequisite for data sharing among a broader research community. Even though there are international guidelines available concerning the de-identification of data, there is no guidance for the de-identification of health-related data specifically taking into account the Swiss law and data protection regulations. The U.S.-Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule establishes national standards to protect individuals' medical records and other personal health information, and also Swiss research projects often refer to the HIPAA when documenting the de-identification process. However, the HIPAA Privacy Rule cannot be executed in its form in Switzerland.
The PHI Group has therefore launched the Data De-identification Project to develop Swiss recommendations for de-identifying health-related data making data sharable in compliance with Swiss legal requirements and data protection regulations. The recommendations were elaborated by the Data De-identification Project Task Force  in collaboration with additional Swiss university hospital representatives and legal opinion leaders, pooling experiences and knowledge regarding responsible data sharing.
The following project goals were defined:
- Obtaining a legal opinion to inquire the data de-identification approach according to legal requirements in Switzerland.
- Development of hands-on guidance for de-identifying data considering i) the overall project specifications following a quantitative and qualitative risk-based approach and ii) the pragmatic mitigation of re-identification risk by producing a set of de-identification rules.
- Development of a template to document and justify the approach of de-identification.
Feasibility aspects for the implementation of recommended de-identification approaches are considered, but implementation per se belongs to the responsibility of each institution.
A guidance document “Data de-identification – phased approach” was created, which outlines the legal context of de-identifying data in compliance with Swiss law and elaborates in three phases the de-identification process following a risk-based procedure. The phased approach concept aims to not only cover the mitigation of risk for re-identification but also considers the management of a de-identification process, which foresees the verification and periodic review of the performed risk assessment. Hands-on support for defining project specific de-identification and re-identification risk is provided with the “Template use case evaluation and risk assessment” (Appendix A to the guidance document), which is supposed to be completed by the project leader according to the project specifications. The documents are available here:
 Members of the Task Force: Julia Maurer (Swiss Institute of Bioinformatics Personalized Health Informatics (PHI), Marc Vandelaer (wega Informatik AG), Jean-Louis Raisaro (CHUV), Katie Kalt (USZ), Antje Thien (USZ), Fabian Prasser (BHI at Charite, Germany), Bradley Malin (Vanderbilt University, USA)
Swiss legal framework for de-identification of health-related data
To ensure that elaborated de-identification recommendations are in accordance with Swiss legal requirements, the PHI Group has requested an independent legal opinion. The Homburger AG presents in its memorandum* the process of de-identifying personal data and its key elements under the Swiss Data Protection Act (DPA) and the Human Research Act (HRA), and gives evidence on the de-identification requirements to be met. Moreover, it discusses to what extend the two methods provided by the HIPAA – the “Expert Determination” method and the “Safe Harbor” method – meet Swiss legal requirements. The “Expert Determination” method represents a formal determination by a qualified expert. The “Safe Harbor” implies the removal of specified identifiers as well as absence of actual knowledge by the medical professionals that the remaining information could be used alone or in combination with other information to identify the individual.
In summary, the memorandum implies that the simple removal of the direct identifiers does not necessarily result in the data being re-identified only with disproportionate effort, since it does not consider the risk by combination or other remaining risk. Therefore, any rule-based approach will have to be combined with a risk assessment, in order to satisfy Swiss law requirements.
The full memorandum can be found here.
Used terms in the Swiss and international research context
The de-identification process results in coded (pseudonymized) or anonymized data, depending on the method used to be consistent with the specifications of the research project (see figure 1).
Figure 1: De-Identification process results in coded or anonymized data.
For the further use of data, it is crucial to consider the respective legal framework depending on the country where data is processed (Table 1). In this context it needs to be considered that terms may differ in their naming in the applicable legal regulation, but are possibly describing similar categories of data, such as coded and pseudonymized data. On the other hand, coded or anonymized data, may be used in a different sense by the research parties even though the definition of the applicable law appears to be clear. Moreover, it needs to be differentiated between terms describing already de-identified data (coded or anonymized data) or the process of coding (pseudonymization) or anonymization itself, which might be legally defined.
In this context, the European Data Protection Supervisor has recently published the most important misunderstandings related to anonymization, which can be found here.
Table 1: Terms & legal regulations concerning further use of data effective in Switzerland, the European Union & the United States of America.
Data are supposed to be truly anonymized, if re-identification of a person is only possible with a disproportionate effort. Anonymization can include an irreversible masking or deletion The concordance table as depicted in figure 1 should be for example deleted.
Coded or pseudonymized data are de-identified data which are still considered as personal data. The process of coding or pseudonymization is reversible, so that re-identification of the data subject is possible with the according key (concordance table) but is restricted to duly authorized users. Although using coded/pseudonymized data might be accompanied with a higher risk of re-identification, it brings some advantages for the researchers legitimating the use of coded/pseudonymized data only, such as:
- Facilitating follow up research
- Avoiding loss of value of data that have been anonymized
- Informing data subjects or their care providers of reportable, incidental findings
A phased approach concept for the de-identification of data
In accordance with the conclusions of the Homburger AG memorandum and taking into account (international) publications on the requirements for de-identification, the Data De-identification Project Task Force has developed recommendations for a phased de-identification approach.
The aim of the de-identification workflow composed of three phases is to combine both risk-based and rule-based approaches as schematized in Figure 2.
The 1st phase is dedicated to assessing and mitigating patient re-identification risks. The risks are inherent to both the research project’s control measures (e.g., data storage location, contracts and policies, cohort profile, IT infrastructure and security) and the data set itself (data types and specific variables). As such, this phase seeks to define and subsequently reduce the research project’s risk profile by introducing appropriate control measures in the project’s context and specifying accurate de-identification rules on dataset variables.
The 2nd phase consists of the implementation of de-identification rules defined during the 1st phase (e.g., replacement of variable value by a pseudo identifier, suppression of a variable value). It is in the responsibility of the data provider (i.e., individual hospital) to specify the implementation of these rules in detail as they depend on the provider’s internal IT requirements and constraints (i.e., data privacy, information security, etc.).
Since a research project’s lifecycle frequently requires adaptations of data exchanges between the provider and the recipient (e.g., new variables required) or even of the project context (e.g., new processor involved), a 3rd phase completes the de-identification workflow. This phase is dedicated to a periodic review of the project and of any modification which may require the overall de-identification workflow to be run again (phases 1 and 2). Modifications to be considered should be those inherent to the research project, but also external ones related, for example, to technological or organizational evolutions impairing the initially assessed re-identification risks.
Figure 2: De-identification of health-related data – recommended phased approach. Phase 1 comprises the re-identification risk management assessing and mitigating patients’ re-identification risk. Within phase 2 risk mitigation actions specified in phase 1 are implemented and verified accordingly. Phase 3 describes the periodic review of the risk assessment performed according to project specifications.