R Shiny as a data risk assessment and education tool for the Australian Data Archive

Ms Janet McDougall1, Dr Masud Hasan1, Dr Ryan Perry1

1Australian National University, Canberra, Australia

Public data repositories have an important role in facilitating open science. A challenge for repositories though is ensuring both the public availability of data and the privacy of research participants.

Both the ADA and data owners are bound by The Privacy Act 1988. Data owners are encouraged to de-identify their data prior to submission, but archivists are often required to guide data owners through this specialist process to ensure compliance with the Privacy Act, and with ADA procedures for handling sensitive data.

We have been developing standardised and automated data risk assessment procedures using R to automate the archivist data processing workflow and produce a Data Risk Assessment Report for data owners.  Having developed code in R for this purpose, we now aim to use R Shiny as an educational and self-assessment tool for data owners (including depositors and the research community more broadly).

The R Shiny data app we are developing – the ADA DRAT (Data Risk Assessment Tool) – will guide users to assess their own data on a series of privacy criteria in preparation for submission. The DRAT will include pop-up educational suggestions and definitions of data privacy risk, as well as example output using live synthesised data to reflect de-identification options selected by the user.

In this presentation we will demonstrate current tool capabilities: data cleaning, flagging direct and indirect identifiers, and generating standardised vocabulary suggestions.


Biography:

Janet McDougall is a Senior Data Archivist with the Australian Data Archive, ANU.  Her background includes mainframe IT, data management, GIS, and social research (ANU MSR Adv).  Janet specialises in archival workflows and sensitive data – curation, preservation, technical implementation, controlled vocabularies, and metadata standards.

Dr Masud Hasan is a researcher at the ANU College of Health & Medicine. His research includes modelling epidemiological impacts of changes in climate. His expertise includes development of data processing code in R. He completed his PhD in 2011 at The University of the Sunshine Coast.

Dr Ryan Perry is an archivist at the Australian Data Archive, ANU. He specialises in archiving national, longitudinal health and economic data. His research examines the role of personality and ideology on political affiliation and policy support He completed his PhD at the University of Auckland, New Zealand in 2014.

Categories