Mr Dave Connell1
1Australian Antarctic Division, Kingston, Australia
At the Australian Antarctic Data Centre (AADC) data management rests on a “three legged stool” of applications designed for users and administrators. These three applications form the basis of a successful data management strategy, a process which has been gradually refined since the AADC came into existence over twenty years ago.
These three applications consist of a science project management tool, a metadata creation and discovery tool, and a data submission tool.
THE SCIENCE PROJECT MANAGEMENT TOOL – MYSCIENCE
In order to facilitate effective data management within the Australian Antarctic program (AAp), the AADC was established in 1995. Since then the AADC has created infrastructure to support the archival of data and the creation of metadata records, as well as value-adding to Australian Antarctic data through the use of GIS and mapping tools, the creation of targeted databases, and data-analysis activities. However, until 2012 the AADC lacked the capability of directly managing the data archival needs of each AAp science project. At this time, the AADC launched the MyScience project management application (Finney 2014) , which allowed AADC staff to efficiently keep track of AAp science projects and ensured that all expected datasets were accounted for, and that AAp scientists were not unduly pestered for data before the expected due dates.
MyScience primarily achieved this goal through the use of Data Management Plans (DMPs). Early on in the project application phase, scientists are required to complete a DMP for their project in order to inform the AADC what data to expect from the project and when. This creates a “shopping list” of data that can be expected from each project that the AADC can progressively check off. This also allows AADC staff to objectively rate each project/responsible scientist on its/their data management effectiveness. These ratings can be collated in large reports and presented to the AAp funding office for further evaluation. Funding for future projects can then be prioritised to scientists who fulfil their data management obligations.
Furthermore, the MyScience application is linked to the other two legs of the stool, the metadata tool and the data submission tool, providing a “one stop shop” for scientists when it comes to the management of their data.
THE METADATA TOOL
Metadata is a crucial element of the AADCs data management strategy, for without well written metadata, many of the datasets stored at the AADC would be of little value. As such, providing a simple method for scientists and AADC staff to write metadata records was of great importance. For many years, due to limitations with technology and metadata standards, this was not possible. More often than not, a low-tech, labour intensive approach was required to produce metadata of an adequate standard. The current metadata tool used by the AADC was released in 2015, and has proven to be very successful at delivering high quality metadata records. It not only simplifies the metadata creation process for users, but still retains all the required complexity and detail in order to minimise the effort required by AADC administrators to evaluate and process the records.
THE DATA SUBMISSION TOOL
The final leg of the stool, was to provide a way for users to reliably and safely upload their data to the AADC. The first attempt to create a data submission tool was released in 2008, but while well intentioned, and well thought out, it was not well designed, was poorly developed, and was limited in capability. The tool did not integrate well with other AADC applications, and grew evermore buggy before it became irretrievably broken in 2015. A replacement tool was finally released in 2017, and unlike its predecessor has thus far proven to be both well designed, well developed and very capable.
The new data submission tool has been tightly integrated into both the MyScience application and the metadata tool for ease of use and more reliable reporting by AADC staff. The new tool also allows much greater file sizes to be uploaded to the AADC, and ensures that datasets do not become “lost in an inbox”.
IS THERE A CUSHION ON THE STOOL?
While these three applications form the basis of the administration of data management at the AADC, there are of course other factors which “sweeten” the user experience, and ease the burden on administrators. These sweeteners include, dataset DOIs for an increased citation presence; search tools for downloading publicly accessible data; value-adding to the data; integration with other metadata catalogues and data repositories for increased exposure on the world stage.
IS THE STOOL WOBBLY?
Despite the success of these three applications within the AADC with regard to enhancing data management practices, there is still room for improvement. The data submission tool and the MyScience application need to be further linked so that when datasets are submitted to the AADC they are automatically checked off the DMP; reporting mechanisms need to be improved so that AADC data managers can more effectively manage the repository; thought needs to be given as to whether data access should be kept as primarily limited to downloading “flat files”, or whether to evolve to incorporate an integrated, multi-dataset service.
And while the AADC has made a very nice stool, the AADC can’t force people to sit on it – some scientists exploit a policy loophole in the AAp which allows them to collect data under a “non-science” project, which comes with no obligation to archive the data. Furthermore, despite a concerted effort to make data management as easy as possible, the AADC has been unable to achieve 100% compliance when it comes to data archival.
- Finney K (2014) Managing Antarctic Data – A Practical Use Case, Data Science Journal, 13 PDA8-PDA14, doi.org/10.2481/dsj.IFPDA-02
Dave Connell completed a Bachelor of Science (honours) degree at the University of Tasmania, and has been working at the Australian Antarctic Division since 1998 and as the metadata officer since 1999. His role is to catalogue and archive all scientific data collected by the Australian Antarctic program – specifically to ensure that scientists write high quality metadata records and archive their data in a timely manner. During his time at the AAD, he has overseen the transition from ANZLIC metadata to DIF metadata, and also developed tools for converting DIF metadata into various profiles of the ISO 19115 metadata standard. Dave is also very active in the Australian Government metadata space – reviewing and adapting ISO 19115 metadata standards for use in Australian scientific organisations. He has also worked with the Ocean Acidification – International Coordination Centre to develop an ocean acidification metadata profile.