Ms Catherine Nicholls1
Contributing Author: Nicholas McPhee2
Content contributors: David Groenewegen3, Neil Dickson4, Adrian Tritschler5, Steve Quenette6, David Lam7
1Monash University, Melbourne, Australia, firstname.lastname@example.org
2Monash University, Melbourne, Australia, email@example.com
3Monash University, Melbourne, Australia, firstname.lastname@example.org
4Monash University, Melbourne, Australia, email@example.com
5Monash University, Melbourne, Australia, firstname.lastname@example.org
6Monash University, Melbourne, Australia, email@example.com
7Monash University, Melbourne, Australia, firstname.lastname@example.org
OVERVIEW – WHEN GOOD DATA GOES BAD
Every institution knows that sinking feeling, when it comes across research data that has maybe been hanging around for too long, probably with the wrong crowd. When asked to identify itself, the research data shrugs and says it has no owner, it doesn’t have to explain what it’s about, or where it has come from…in fact, it chooses to live within the shadows. Or maybe it was never made aware that it needed to know this information and provide it when required. Other data may have been identifiable at some point, but was moved around so much, all of its context was lost and it too has fallen onto hard times. It just wants to stay where it is, clogging up critical (often expensive) space and resources. In amongst this underbelly of rough house data, there can also co-exist golden nuggets of key research data. But sometimes these too can fall into the shadows and be hard to find or identify. Missing these golden nuggets (or the ‘good citizen’ data), can have consequences as well, for all research institutions. So how do we best weed out the bad data and preserve the good?
PURPOSE – A CASE STUDY – WHAT IS MONASH DOING TO ADDRESS THIS?
The aim of this presentation is to discuss a recent Monash case study, which involves a range of different stakeholders and solutions (both technical and procedures/policy wise) that has resulted in some small, yet valuable steps forward in identifying ways to tackle some of these data management issues. In particular this presentation will highlight how existing information management principles have been modified and enhanced, to enable a specific focus on the e-research space. The aim here is to develop scalable policies and procedures around how we sentence data going forward to help manage data growth issues into the future. The work to date has been successful due to the combined efforts of IT staff, University Library, eResearch and Records and Archives staff all working collaboratively.
By ‘sentence’ we mean applying a statement of action for the data to be either deleted, moved by a certain date or permanently retained. For example a set of data that was sentenced on the 30/6/2017 with “D2018” should be destroyed (meaning deleted permanently) on or around the 30/6/2018.
THE CASE STUDY – SENTENCING MURDA
The case study will cover the following points:
The big picture
-Brief description of Monash as an institution (number of researchers, major areas of research, etc).
-Current policy framework and the role of the Monash Agency Working Group (MAWG), including how in this instance, MAWG brought together a range of University functions including staff from IT, Library, eResearch and Records and Archives management to help tackle various data management challenges, including how to better identify and then sentence research data for either retention or disposal purposes.
Identification of specific issues that shaped the case study
-Identification of issues (hardware decommissioning and migration, orphaned data, general data management challenges, growth of data now greater than our ability to store it etc)
-Rationale behind the continuing University Data Lifecycle Project which is addressing both corporate and research data management needs (focus of this presentation is on the research space).
– Specific mention of the need in the research space for the consolidation of orphaned research data collections (e.g. use of a managed repository called MURDA and the processes built up around that to ensure a range of things take place over the longer term, including the capture of useful metadata that can then be used help sentence the data and take action on it.)
-Brief discussion of data sources and motivation for creation of MURDA (e.g. decommissioning of eResearch standalone servers and IBRIX), and an overriding desire to not repeat past mistakes, e.g. to address past metadata failures, and to stop placing an over reliance on individual IT staff members to adhocly apply sentencing to data. The goal of this case study was to look for ways to support these actions (and staff and ultimately the researchers themselves) from a higher level with a more consistent, across the board application of policy, procedures and processes around data retention and disposal activities..
Nuts and bolts of case study
-Overview of data sentencing (e.g. data retention and disposal rules) around research data at Monash University.
-Details of ongoing work to sentence research collections held in MURDA (with a specific focus on some of the processes that have been adapted or developed in this case study that lend themselves to being used across the larger e-research space).
– Future plans and enhancements for improving metadata storage/capture (e.g. CRAMS, but also the simplicity of some specifics e.g. allowing for a searchable field to record retention or disposal sentences in).
Facing up to and then attempting to wrangle the scale of some of this badass legacy data, is not for the faint hearted. It takes a lot of different kinds of expertise and involves a fair amount of trial and error. But it’s not all blood and guts and gore. Starting the process and dealing with even small amounts of data sentencing can produce some quick wins and reassure those involved that it is worth tackling. As letting the problem continue to grow and attempting to always try and correct it retrospectively (or leave to a ‘future’ staff member to fix) is not particularly prudent or wise in this current world of rapid data growth.
Some additional early reflections include:
-Sometimes, the stick is more appropriate than the carrot (i.e. giving researchers specific deadlines and enforcing them is often the only way to force a decision)
-Technology is largely irrelevant when it managing a repository like the MURDA (i.e. the storage only has to be effective, not elegant as the important aspects are the policies and procedures that allow data to be archived in line with good data management guidelines, as well as the metadata that is captured about the collections)
– Importance of gaining visibility over the data cannot be overstated
– May start off with IT or other administrative staff overseeing and refining the process, but the long game is for researchers to be able to manage some of these tasks independently themselves (although this will rely on mature tools being available to permit this to occur, along with the continued development of appropriate processes and policy to support the disposal and retention activities across the board).
Figure 1. Overview of how Monash University supports its researchers with their data management needs.
Catherine is a Records Manager professional with over 20 years experience working in the tertiary sector, across both Monash University (her current employer) and previously, the University of Melbourne. Catherine is also a current (part time) PhD candidate at Monash University. At Monash, Catherine has been fortunate to work with the Monash Agency Working Group (including the Library and eResearch team) to develop today’s presentation.