In the IRISS pipeline: A curation tool for integrated data risk assessment
Ryan Perry1, Weifan Jiang1, Tina Gregor1, Janet McDougall1 1Australian National University, Canberra, ACT, Australia
Abstract
The Integrated Research Infrastructure for Social Science (IRISS) Project will enable social science researchers to create, disseminate, integrate, and utilise data sources to generate new insights. A challenge for the project is to ensure data privacy throughout these data sharing and access activities.
The Australian Data Archive (ADA) has developed a data risk assessment tool based on established archival processes for preparing and sharing social science data. This includes assessment of risk common to both data integration and panel designs in which linkage of survey responses is difficult to anticipate. Integrated data requires stringent de-identification to reduce the added risk of re-identification through combining data from different sources.
In this presentation we will briefly outline the ADA data risk assessment tool developed with R Shiny. The tool can aid users to identify both sensitive and identifying data, and provides recommendations to reduce risk.
The risk assessment tool will be integrated in the IRISS project curation pipelines, helping to ensure research-ready versions of data are sufficiently de-identified before release. When identification risk is present, the tool will guide users through appropriate modifications to the data that might include aggregation into broader categories, or k-anonymisation of unique response combinations. The tool is both a technical and educational resource designed for a wide community of users to safely and effectively engage with IRISS.
This presentation is intended to be delivered as a part of a session on the IRISS project.
Biography
Ryan Perry is Deputy Director at the Australian Data Archive and Senior Research Fellow at the Centre for Social Research & Methods. His research examines the role of personality and ideology on political affiliation and policy support. He completed his PhD at the University of Auckland, New Zealand in 2014.
Weifan Jiang is a Data Archivist at the Australian Data Archive. Weifan has completed a Master of Machine Learning and Computer Vision at Australian National University. He has expertise in R and Python; his previous work has included analysis and visualisation of massive Meteorological and Digital Elevation Data at CSIRO.