Data protection tools have been developed to ensure that sensitive personal information in critical data sets, such as those used to track the spread of the COVID-19 outbreak, pass close scrutiny before being shared publicly.
The Data61 expert group under the Australian National Science Agency (CSIRO), the New South Wales State Government, the Australian Computer Society (ACS) and other institutions have cooperated to develop a privacy assurance tool, which is named Personal Information Factor (Personal Information Factor for short) PIF), which can assess the risk of personal data in any data set and establish a targeted and efficient protection mechanism.
Traditionally, such assessments have been conducted by leading data and privacy experts. Today, experts can quickly validate data sensitivity assessment results using computer models.
Since 2020, Australia’s national science agency has been working with the country’s Cybersecurity Cooperative Research Centre (CSCRC) to explore ways to enhance the tool.
01 Use complex data analysis algorithms
The PIF tool uses a sophisticated data analysis algorithm to assess the risk of redemption of sensitive information in the dataset (whether desensitized personal information can still be rematched with the actual owner).
Since March 2020, the NSW government has been using an earlier version of the tool to track the state’s COVID-19 spread data set, with the aim of ensuring that data content is properly protected before it is released to the public.
Dr Ian Oppermann, Chief Data Scientist at the NSW Government, said, “The role of the PIF tool is currently unique. It has undergone a long-term collaboration and development, resulting from the tireless efforts of state, federal government and industry practitioners.”
“Every day, it is helping us conduct security and privacy risk assessments of anonymized COVID-19 infection datasets in NSW. With its help, we are able to restore sensitive information to the public before publishing the data’s content. Risk is minimized.”
Dr. Oppermann also mentioned that COVID-19 has further heightened public awareness of the need for data privacy.
Dr Oppermann noted, “Given the strong community concern about the growing number of COVID-19 cases, we need to release critical information in a timely manner at a fine-grained level detailing when and where new cases of COVID-19 were confirmed.This work requires us to reason about possible causes of infection early in an epidemic and identify the age range of those infected. “
“We want the data to be as detailed and accurate as possible, while effectively protecting the privacy and identity of individuals associated with these datasets.”
02 Data de-identification methods can further increase the level of privacy
Dr Sushmita Ruj, Principal Investigator and Senior Research Scientist at the Australian National Science Agency’s Data61 project, said:New methods of data de-identification are expected to further improve the level of privacy and ensure that personal private data is strictly protected.
Dr. Ruj mentioned, “After examining a variety of privacy metrics, the research team decided to adopt a unified measure for assessing the level of risk of successful identity restoration of a given data.”
“PIF is always exploring new ways to consider how to eliminate various attack methods that can achieve identity restoration, and apply tailored protection measures to different datasets accordingly. Based on this, the tool will do A PIF score.”
If the PIF is above the required threshold, the program will make recommendations on how to improve the security level of the framework and demonstrate that the dataset is safe for public release.
Professor Helge Janicke, Research Director at the Australian Cybersecurity Cooperative Research Centre, said:The most important goal is to find a balance between the need for information sharing and the protection of privacy.Professor Janicke mentioned, “With the help of PIF, all parties can fully understand the risk level, which undoubtedly fills the gap in the field of related tools.”
“Data analytics has become a well-known technical solution, but it has been difficult to grasp the specific quality of shared outputs. Because of this, PIF plays an extremely important role in assessing the ethical and responsible level of key data sharing behavior based on indicators. important role. With this technology, data owners can comprehensively assess the risks and subsequent impacts associated with data sharing.”
PIF tools can also be used to examine other datasets to be published,Examples include domestic violence data and public transport usage data collected during COVID-19 social distancing.CSIRO Data61 and CSCRC will continue to develop the PIF tool and plan to enter the external rollout phase by June 2022.