Planting SEEDS to Maximise Data Potential

Margie Smith1, Heather Riley1

1Geoscience Australia, Canberra, Australia, heather.riley@ga.gov.au

 

Geoscience Australia is Australia’s pre-eminent public sector geoscience organisation and the nation’s trusted advisor on geoscience and spatial data.  A key objective for Geoscience Australia is to transform this data into information and knowledge in order to address important national issues, and deliver a broad range of products that assist Government and the community to make informed decisions about the use of natural resources, management of the environment, and community safety.

In order to maintain its reputation as the custodian of geoscientific and spatial data and knowledge, Geoscience Australia must strive to maximise data’s potential. ‘Maximising Data Potential’ is one of the four pillars of the OneGA vision. To achieve this pillar, Geoscience Australia has recently developed and endorsed an enterprise Data Strategy.

The Geoscience Australia Data Strategy was developed through a rigourous consultation process in order to analyse the current state of data at the organisation, and understand the future directions the organisation needs to take to achieve its vision.

From these consultations and analysis it became clear that Geoscience Australia holds a wealth of data and information, but to truly maximise data potential, strong foundations for data and its management need to be laid. The vision for data at Geoscience is for data to be:

  • Accessible
    • Data is open and can be easily retrieved when required
  • Discoverable
    • Data can be found easily and when required
  • Reusable
    • Data can be used again and again, in ways beyond its original intention
  • Managed throughout its lifecycle
    • Data is not managed at a ‘point in time’ during the scientific process; it is part of the scientific process
    • Data is managed to ensure ongoing value can be derived
  • Trusted and quality is well described
    • The strengths and limitations of the data are transparent to users
    • Users have confidence in the data and information Geoscience Australia provides

For Geoscience Australia to successfully maximise data potential, a number of objectives will need to be achieved. These objectives can be referred to as SEEDS

The SEEDS are:

  • Streamline data processes, systems and tools
  • Embed best practice data management
  • Encourage and reward data management
  • Develop data capabilities
  • Strengthen and embed data governance

For a seed to grow and flourish, the right conditions need to be provided. The Geoscience Australia Data Strategy is planting seeds for changes, and providing conditions under which data at Geoscience Australia can grow and flourish so it’s potential can be maximised.

Towards ‘end-to-end’ research data management support

Mrs Cassandra Sims1

1Elsevier, Chatswood, Australia, c.sims@elsevier.com

 

Information systems supporting science have come a long way and include solutions that address many research data management needs faced by researchers, as well as their institutions. Yet, due to a fragmented landscape and even with the best solutions available, researchers and institutions are sometimes missing crucial insights and spending too much time searching, combining and analysing research data [1].

Having this in mind, we are working on holistically addressing all aspects of the research life cycle as it is shown in Figure 1. The research lifecycle starts from the design phase when researchers decide on a new project to work on next, prepare their experiments and collect initial data. Then it moves into the execution mode when research experiments are being executed. Research data collected, shared within the research group, processed, analysed and enriched. And finally research results get published and main research outcomes shared within the scientific community networks.

Figure 1: Research lifecycle

Throughout this process researchers use a variety of tools, both within the lab as well as to share their results. Research processes like this happen every day. However, there are no current solutions that enable end-to-end support of this process for researchers and institutions.

Many institutes have established internal repositories, which have their own limitations. At the same time, various open data repositories [2] have grown with their own set of data and storage/retrieval options, and many scholarly publishers now offer services to deposit and reference research datasets in conjunction with the article publication.

One challenge often faced by research institutes is developing and implementing solutions to ensure that researchers can find each other’s research in the various data silos in the ecosystem (i.e. assigning appropriate ontologies, metadata, researcher associations). Another challenge is to increase research impact and collaboration both inside and outside their institution to improve quantity and quality of their research output.

Making data available online can enhance the discovery and impact of research. The ability to reference details, such as ownership and content, about research data could assist in improved citation statistics for published research [3]. In addition, many funders increasingly require that data from supported projects is placed in an online repository. So research institutes need to ensure that their researchers comply with these requirements.

This talk will be about a suite of tools and services developed to assist researchers and institutions in their research data management needs [4], covering the entire spectrum which starts with data capture and ends with making data comprehensible and trusted enabling researchers to get a proper recognition and institutions to improve their overall ranking by going “beyond the mandates”.

I will explain how it integrates through open application programming interfaces with the global ecosystem for research data management (shown in Figure 2), including:

  • DANS [7] for long-term data preservation,
  • DataCite [5] for DOIs and indexed metadata to help with data publication and inventory,
  • Scholix [6] for support of links between published articles and datasets,
  • More than 30 open data repositories for data discovery.

Figure 2: Integration with the global research data management ecosystem

The talk will conclude with the overview of the current data sharing practices and a short demonstration of how we incorporate feedback from our development partners: University of Manchester, Rensselaer Polytechnic Institute, Monash University and Nanyang Technological University.

REFERENCES

  1. de Waard, A., Cousijn, H., and Aalbersberg IJ. J., 10 aspects of highly effective research data. Elsevier Connect. Available from https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data, accessed 15 June 2018.
  2. Registry of research data repositories. Available from: https://www.re3data.org/, accessed 15 June 2018.
  3. Vines, T.H. et al., The Availability of Research Data Declines Rapidly with Article Age. Current Biology, 2014, 24(1): p. 94-97.
  4. Elsevier research data management tools and services. Available from: https://www.elsevier.com/solutions/mendeley-data-platform, accessed 15 June 2018.
  5. DataCite. Available from: https://www.datacite.org/, accessed 15 June 2018.
  6. Scholix: a framework for scholarly link exchange. Available from http://www.scholix.org/, accessed 15 June 2018.
  7. Data Archiving and Networked Service (DANS). Available from: https://dans.knaw.nl/en, accessed 15 June 2018.

Biography:

Senior Research Solutions Manager ANZ

Cassandra has worked for Elsevier for over 6 years, as Product Solutions Manager APAC and currently as Senior Research Solutions Manager ANZ. Cassandra has demonstrated experience and engagement in both the Academic, Government and Health Science segments in region, working with Universities, Government Organisations, Local Area Health Districts, Funders and Industry, to assist in the development of business strategies, data asset management and core enterprise objectives. Specialising in detailed Analytics, Collaboration Mapping and Bibliometric Data, Cassandra builds on her wealth of knowledge in these areas to assist our customer base with innovative and superior solutions to meet their ever changing needs. Cassandra has worked with the NHMRC, ARC, MBIE, RSNZ, AAMRI and every university in the ANZ region. Cassandra is responsible for all new business initiatives in ANZ and in supporting strategic initiatives across APAC.

Making the University of Adelaide Magnetotellurics data collection FAIR and onto the path towards reproducible research

Nigel Rees1, Ben Evans2, Graham Heinson3, Jingbo Wang4, Lesley Wyborn5, Kelsey Druken6, Dennis Conway7

1The Australian National University (NCI), Canberra, Australia, nigel.rees@anu.edu.au

2The Australian National University (NCI), Canberra, Australia, ben.evans@anu.edu.au

3The University of Adelaide, Adelaide, Australia, graham.heinson@adelaide.edu.au

4The Australian National University (NCI), Canberra, Australia, jingbo.wang@anu.edu.au

5The Australian National University (NCI), Canberra, Australia, lesley.wyborn@anu.edu.au

6The Australian National University (NCI), Canberra, Australia, kelsey.druken@anu.edu.au

7The University of Adelaide, Adelaide, Australia, dennis.conway@adelaide.edu.au

 

Magnetotelluric (MT) data in the research community is traditionally stored on departmental infrastructures and when published, the data is in the format of processed esoteric downloadable files with limited metadata. In order to obtain the source raw MT time-series data, a lengthy process ensues where one would typically have to email the data owner and transfer would be either via FTP download for local processing, or in some cases, the files sizes are so large that they need to be transferred on hard disk via Australia Post.

It has become increasingly apparent to the MT community that in order to increase online collaboration, reduce time for analysis, and enable reproducibility and integrity of scientific discoveries both inside and beyond the MT community, datasets need to evolve to adopt Findable, Accessible, Interoperable and Reusable (FAIR) data principles. The National Computational Infrastructure (NCI) has been working with The University of Adelaide to address these challenges as part of the 2017-2018 AuScope-ANDS-NeCTAR-RDS funded Geoscience Data Enhanced Virtual Laboratory (DeVL) project. The project aims to make the entire University of Adelaide MT data collection (from 1993-2018) FAIR. NCI have also added an assortment of MT processing and modelling software on both their Virtual Desktop Infrastructure and Raijin Supercomputer, which has helped to reduce data processing and subsequent modelling times.

The University of Adelaide MT data collection needs to be both discoverable and accessible online, and conform to agreed international community standards to ensure interoperability with other international MT collections (e.g., AusLAMP [1], EarthScope USArray [2], SinoProbe [3]), as well as reusability for purposes other than what the data was collected for. For the process to become more transparent, the MT community will need to address fundamental issues including publishing FAIR datasets, publishing model outputs and processing regimes, re-evaluating vocabularies, semantics and data structures, and updating software to take advantage of these improvements. For example, it is no longer sufficient to only expose the processed data; the raw instrument data needs to be preserved persistently so that as algorithms improve, the original source data can be reprocessed and enhanced. Consistent with the FAIR and reproducibility principles, the MT processing and modelling tools should also be easily discoverable and accessible and where required usable in online virtual environments, with software versions citable. The journey from the raw-data to the final published models should be transparent and well documented in provenance files, so that published scientific discoveries can be easily reproduced by an independent party.

One of the components of this project has been to explore the value of converting raw MT time-series into open scientific self-describing data formats (e.g., Network Common Data Form (netCDF)), with a view to showing the potential for accessibility through data services. Such formats open up the ability to analyse the data using a much wider range of scientific software from other domains. As an example, Jupyter Notebooks have been created to show how the MT data can be accessed and processed via OPeNDAP data services. These changes alone will aid in the usability of the data, which can be accessed without having to explicitly pre-download the data before commencing any analysis.

The Geoscience DeVL project has focused on making the University of Adelaide MT data available online as well as assembling software and workflows available in a supercomputer environment that significantly improve the processing of data. This project has also made a valuable addition to the AuScope Virtual Research Environment, which is progressively making more major Earth science data collections, software tools and processing environments accessible to the Australian Research community. The results of our work are also being presented at international MT forums such as the 24th EM Induction Workshop [4] held in Helsingør, Denmark, to ensure that the data capture, publishing, curation and processing being undertaken at NCI is in line with best practice internationally.

ACKNOWLEDGEMENTS

This work was supported by the National Computational Infrastructure, AuScope Limited, ANDS-NeCTAR-RDS and The University of Adelaide.

REFERENCES

  1. The Australian Lithospheric Architecture Magnetotelluric Project (AusLAMP). Available from: http://www.ga.gov.au/about/projects/resources/auslamp , accessed 21 June 2018.
  2. The EarthScope USArray magnetotelluric program. Available from: http://www.usarray.org/researchers/obs/magnetotelluric , accessed 22 June 2018.
  3. SinoProbe – Deep Exploration in China. Available from: http://sinoprobe.cags.ac.cn/About-Sinoprobe/ , accessed 22 June 2018.
  4. The 24th EM Induction Workshop (EMIW2018). Available from: https://emiw2018.emiw.org/ , accessed 21 June 2018.

Biography:

Nigel Rees is a Research Data Management Specialist at the National Computational Infrastructure (NCI) with a background in magnetotelluric geophysics. In his role at NCI, he supports research data needs and assists with the management, publishing and discovery of data.

Derived through analysis: linking policy literature and datasets

Les Kneebone1, Steven McEachern2, Janet McDougal2

1Analysis & Policy Observatory, Melbourne, Australia

2Australian Data Archive, Canberra, Australia

 

The research community has been witnessing new and innovative approaches to making data objects discoverable, outside of traditional scholarly publishing contexts. Datasets referenced within scholarly publications can be made persistently identifiable using the same identifying approaches used for the publications themselves. Datasets can be stored, discovered and reused via specialist dataset repository platforms. Therefore, creating graph databases of interlinked research data and publications is now a reality.

Linking grey literature publications to datasets presents special challenges. Analysis & Policy Observatory (APO) has, since 2004, collected grey policy literature and organized its collection with ubiquitous and emerging metadata standards. APO now focuses on expanding the reach of its collection by establishing links with other research objects. Datasets, from which policy reports are derived, are of special interest. APO is therefore working with Australian Data Archive (ADA) to connect its datasets with APO grey literature.

The challenge for grey literature and dataset linking

Persistent identifiers (PIDs) for digital information objects is well recognized as a key data points needed as the basis for links between objects. PIDs such as Digital Object Identifiers (DOIs) have enjoyed significant update in traditional academic publishing contexts. Minting DOIs for grey literature, in contrast, is an exceptional practice in policy sector. APO is taking a lead in promoting use of DOIs for grey literature – nonetheless, DOI coverage remains sporadic in policy collections. A similar context exists for datasets – DOIs are often minted after original publication and only once harvested and curated within special data repositories. ADA, like APO, has undertaken the significant challenge of assigning PIDs to datasets. The challenge for linking grey literature, then is one in which structured publication data is not always available to work with.

The response from ada and apo

Researchers cannot wait for all research objects to become entities. As research repository custodians, we will miss opportunities to combine our collections in ways that helps researchers if we wait for complete, or near complete PID coverage. Therefore ADA and APO are piloting approaches to linking objects using a combination of unstructured, semi-structured and structured data:

  • Text mining and natural language processing, to help predict semantic and logical links
  • Leveraging metadata, such as controlled vocabularies, to improve link prediction
  • Locating and matching existing PIDs in each repository

From the pilot, APO and ADA hope to learn the following:

  • What is PID coverage in our repositories?
  • At what aggregation level should links be made, i.e. collection vs item level?
  • What commonalities, and opportunities exist in respective metadata approaches?
  • How can taxonomies be leveraged to improve predictions and matches?
  • What interfaces between the repositories are scalable, reusable and in scope?

This research was funded by the Australian Research Council Linkage Infrastructure, Equipment and Facilities grant Linked Semantic Platforms (LE180100094).


Biographies:

Les Kneebone has worked in information management roles in government, school, community and research sectors since 2002. He has mainly contributed to managing metadata, taxonomies and cataloging standards used in these sectors. Les is currently supporting the Analysis & Policy Observatory by developing and refining metadata standards and services that will help to link policy literature with datasets.

Steven McEachern is Director and Manager of the Australian Data Archive at the Australian National University, where he is responsible for the daily operations and technical and strategic development of the data archive. He has high-level expertise in survey methodology and data archiving, and has been actively involved in development and application of survey research methodology and technologies over 15 years in the Australian university sector. https://orcid.org/0000-0001-7848-4912

Janet McDougall is a Senior Data Archivist at the Australian Data Archive, with a background in systems IT, data management, GIS, and social research. Her role includes outreach and curation of research data for preservation, archiving and publication. She is also involved in the ongoing implementation of metadata and standards focussed mainly in the social sciences and humanities, but also has experience with long-term ecological data from curation and procedural perspectives.

Connecting DMPs to power up research BoF

Natasha Simons1, Kathryn Unsworth2, Andrew Janke3, Peter Neish4, Liz Stokes5

1Australian Research Data Commons, Brisbane, Australia, natasha.simons@ands.org.au

2CSIRO, Melbourne, Australia, Kathryn.Unsworth@csiro.edu

3National Imaging Facility, The University of Queensland, Brisbane, Australia, andrew.janke@uq.edu.au

4The University of Melbourne, Melbourne, Australia, peter.neish@unimelb.edu.au

5University of Technology Sydney, Sydney, Australia, elizabeth.stokes@uts.edu.au

 

DESCRIPTION

Data Management Plans (DMPs)1 are of increasing importance in the world of research. They are a requirement of many research funders and recommended by some research institutions. They are claimed to save time and effort for researchers, research institutions and funders, but have increasingly been viewed as a compliance burden with no evidence for efficacy2. International discussions within the community that creates and supports DMP tools, such as the RDA Active Data Management Plans Interest Group3, reflect a dynamic discussion about how DMPs can be made more effective and useful and to which standards can be applied. The Australian DMP community is heavily involved in contributing to and leading these discussions. Our community has also developed new, innovative tools and approaches that connect DMPs to things and people such as library systems, researchers, storage, ethics, persistent identifiers, research infrastructure providers, and so on (see examples from the eResearch Australasia 2017 DMPs workshop4). In some cases it can be shown that connecting services such as storage to minimal DMPs can drive uptake and compliance with Institutional systems and in an environment of shrinking budgets, it is vital that we maximise the benefits of the research systems we use.

This BoF session will bring the Australasian DMP community together to discuss connecting DMPs to power up research. It will build on discussions held at the eResearch Australasia 2017 DMP workshop and at the Australasian DMP Interest Group meetings5. Outcomes include the production of a comparative table of connected DMPs, the things they connect to, and the standards they use.

AUDIENCE

This BoF will be of interest to those implementing and supporting DMPs and DMP tools and particularly, those interested in discussing new approaches to connect DMPs for increased research efficiency and impact. Participants are asked to come prepared to contribute their ideas and experience to a lively discussion.

SESSION STRUCTURE

We will kick off with lightning talks presented by those who are working at the cutting edge of international developments in DMP tools and approaches. This will include updates from people involved in co-chairing international connected DMP working groups. We will then move to facilitated Q&A and participants will be encouraged to contribute to an open discussion to share experiences, explore ideas and ask questions. Feedback on a comparative table of connected DMPs will be sought throughout the session.

OUTCOMES

Participants in this BoF will come away with a better understanding of why and how DMPs are being connected to power better research and the things they are connected to. They will have had an opportunity to hear and comment upon international and local DMP tools and approaches with a view to future developments. Outcomes include the production of a comparative table of connected DMPs.

REFERENCES

  1. Australian National Data Service. Data Management Plans. http://www.ands.org.au/working-with-data/data-management/data-management-plans, accessed 20 June 2018.
  2. Neylon C (2017) Compliance Culture or Culture Change? The role of funders in improving data management and sharing practice amongst researchers. Research Ideas and Outcomes 3: e14673. https://doi.org/10.3897/rio.3.e14673, accessed 22 June 2018.
  3. Research Data Alliance Active DMPs IG. https://www.rd-alliance.org/groups/active-data-management-plans.html, accessed 20 June 2018.
  4. DMPs workshop, eResearch Australasia 2017. https://www.ands.org.au/partners-and-communities/ands-communities/dmps-interest-group#DMP_s_workshop_eResearch_Australasia_17_Oct_2017-2, accessed 20 June 2018.
  5. Australasian DMPs Interest Group. https://www.ands.org.au/partners-and-communities/ands-communities/dmps-interest-group, accessed 20 June 2018.

Biography:

Natasha Simons is Program Leader, Skills Policy and Resources, with the Australian National Data Service (ANDS). She works with a variety of people and groups to improve data management skills, platforms, policies and practices. With a background in libraries, IT and eResearch, she has a history of developing policy, technical infrastructure and skills to support research and researchers. She is co-chair of the Research Data Alliance Interest Group on Data Policy Standardisation and Implementation and co-convenes an Australasian Data Management Plans Interest Group. Natasha is the Deputy Chair of the Australian ORCID Advisory Group and an Industry Fellow at The University of Queensland in Brisbane, Australia.

Journeying towards digital asset nirvana in CSIRO with scientists – data management challenges and tales from the trenches

Dr Jonathan Yu1, Dr David Lemon2, Mr Peter Fitch2, Mr Paul Box4, Dr Simon Cox1, Mr Benjamin Leighton1, Mr Andrew Freebairn2, Mr Ashley Sommer3, Mr Matthew Stenson3

1CSIRO, Clayton, Australia,

2CSIRO, Black Mountain, Australia,

3CSIRO, Brisbane, Australia,

4CSIRO, North Ryde, Australia

 

CSIRO Land and Water (CSIRO L&W) has been on a journey in search of ‘digital asset nirvana’. There has been an increasing recognition of the complexity and fast changing digital landscape and its influence in how science is undertaken today. Data used in science analyses are increasing in volume and complexity and managing this across projects becomes a key challenge across the organisation. There is an increasing requirement for science workflows to be more agile, repeatable, reproducible, and reusable across projects and initiatives globally.  Therefore, there is a desire from researchers and management to realise the value from these digital assets (internal and external) created and used for research projects in CSIRO L&W (see Figure 1).

Figure 1. Realising value from L&W Digital Assets

Digital asset management in CSIRO provides a particular context for scientific data management:

  • Scientific data is generated in the context of time-bound projects, which are mostly not part of ongoing programs such as those undertaken by agencies such as GA, BoM, ABS
  • Most of CSIRO’s proximate customers include those from the private sector, and therefore tend not to be part of ‘the traditional research community’ at large
  • CSIRO provides tools for staff and partners to publish data via an CSIRO institutional data repository called the Data Access Portal (DAP). DAP is positioned primarily for data publication for public access, however it also features other access arrangements. Challenges exist around internal research project data lifecycle management and providing research project officers with enough incentives, know-how, tools, and low enough costs for researchers to push to DAP if and when appropriate.

Overall these have contributed to a culture in which systematic management of scientific data assets has not been a high priority for most of the researchers who generate digital assets.

In this presentation, we share learnings from challenges and successes while recognising the complex multi-dimensional nature of the CSIRO L&W journey as social, technical and informational. We discuss the influence of social architectures [1, 2] and its application in the journey in CSIRO L&W. A key outcome was the establishment of a data council in L&W called the Digital Asset Management Committee (DAMC). DAMC has been designed as a standing committee to enable recommendations on digital asset management initiatives across projects. DAMC has enabled the CSIRO L&W unit to develop particular solutions identified by DAMC as priority areas through a project called Project DAMbusters.  Figure 2 provides a description of the role of DAMC and DAMBusters and their interaction with L&W staff, external partners, and the L&W Leadership Team (LWLT). Specifically, we present specific informational and technical implementations developed via the DAMbusters project, such as a digital asset registry based on a customised CKAN implementation [3] to enable greater discovery and access of digital assets supporting an audit capability across multiple sources. We also discuss future directions and next steps in CSIRO, and potential opportunities to collaborate with the broader eResearch community.

Figure 2. CSIRO L&W’s Digital Asset Ecosystem

References

  1. Box, Paul. Social Architecture: cultivating environmental data ecosystems. In: Jens Klump, Natalia Atkins, Nicholas Car, Simon Cox, et al, editor/s. Linking Environmental Data and Samples; 29 May – 2 June 2017; CSIRO Black Mountain, Canberra. CSIRO; 2017. 38-39.

    2. Box, Paul; Lemon, David. The Role of Social Architecture in Information Infrastructure: A report for the National Environmental Information Infrastructure (NEII). NEII Website – neii.gov.au: CSIRO; 2015. csiro:EP152134. https://doi.org/10.5072/83/5849a28b08365

  2. The CKAN Project, http://ckan.org, Accessed 22/6/2018

Biography:

Dr Jonathan Yu is a data scientist researching information and web architectures, data integration, Linked Data, data analytics and visualisation and applies his work in the environmental and earth sciences domain. He is part of the Environmental Informatics group in CSIRO Land and Water. He currently leads a number of initiatives to develop new approaches, architectures, methods and tools for transforming and connecting information flows across the environmental domain and the broader digital economy within Australia and internationally.

Enabling access to sensitive data at the Australian Data Archive

Dr Steven McEachern1, Ms Janet McDougall1, Ms Marina McGale1

1Australian Data Archive, Acton, Australia

 

The Australian Data Archive (ADA) has been supporting access to sensitive data since 1981, to support fine-grained access to confidentialised information on the Australian population. As expectations for open data access via the web become increasingly widespread, ADA has continued to develop systems and processes to meet these expectations, while supporting the privacy and confidentiality expectations of the participants in the research. However meeting FAIR data expectations is particularly challenging when the data itself is sensitive and confidential. The potential for breaches of privacy and confidentiality means that access to sensitive data means that access to such data needs to be restricted. New models of managing such access, such as the Five Safes model (Ritchie, 2017), have been developed to provide a framework for enabling release of such data. The Five Safes framework proposes five areas of emphasis in developing data access models:

  • Safe people: Can the researchers be trusted to use the data in an appropriate manner?
  • Safe projects: Is this an appropriate use of the data?
  • Safe settings: Does the access facility limit unauthorized use?
  • Safe data: Is there a disclosure risk in the data itself?
  • Safe outputs: Are the statistical results non-disclosive?

The elements of the Five Safes framework can be implemented in varying combinations, with different emphasis applied to each of the Five Safes, and different combinations of administrative and technical systems, in order to enable safe data access for sensitive content. The Australian Bureau of Statistics, for example, prioritises safe settings, safe data and safe outputs in enabling access to sensitive microdata for government and academic researchers (Webster, 2016).

This presentation will present the recent developments at the ADA to support access to sensitive data through the Dataverse data repository system (http://dataverse.org). Dataverse, developed at Harvard University since 2006, is now used across more than 30 organisations internationally to support open access to research data around the world. In order to provide improved support for sensitive data through Dataverse, the ADA has been working with the Australian Department of Social Services to enable access to key data holdings within the DSS National Centre for Longitudinal Data. These include key longitudinal data assets held by the NCLD, including the widely used Household Income and Labour Dynamics in Australia survey (HILDA – https://melbourneinstitute.unimelb.edu.au/hilda).

This presentation will provide an overview of the Five Safes framework and the Dataverse software. It will then present a walkthrough of the extensions ADA has made to the Dataverse environment to improve support for sensitive data using the Five Safes model, and the technical and administrative processes ADA has adopted in order to enable access to the DSS data through the Dataverse environment. The presentation will conclude with proposed future plans and additional requirements for future research needs in the social sciences, humanities and population health.

References

Ritchie, F. (2017) The ‘Five Safes’: a framework for planning, designing and evaluating data access solutions. University of the West of England, Bristol and Administrative Data Service, UK. Available from: https://zenodo.org/record/897821#.Wxp_QVOFPOS

Webster, A. (2016) The Five Safes Framework: How the ABS is supporting use of public sector data. Available from: http://www.nss.gov.au/nss/home.NSF/533222ebfd5ac03aca25711000044c9e/b691218a6fd3e55fca257af700076681/$FILE/The%20Five%20Safes%20Framework.%20ABS.pdf


Biography:

https://orcid.org/0000-0001-7848-4912

Steve is Director of the Australian Data Archive at the Australian National University. He has research interests in data management and archiving, community and social attitude surveys, new data collection methods, and reproducible research methods. Steve holds a PhD in industrial relations and a Graduate Diploma in Management Information Systems, and has been involved in the management of data archives in the social sciences, business and economics for over 15 years. He is currently a member of the executive for the International Federation of Data Organisations in Social Science (IFDO), and chair of the Executive Board of the Data Documentation Initiative (DDI), an international metadata standard for the management of social science research data used in over 80 countries.

FAIR and friendly data services

Adrian Burton1, Carsten Friedrich2, Sebastien Mancini3, Bruce Simons4, Lesley Wyborn5, Mr Geoffrey Squire2, Dr Peter Dahlhaus4

1Australian Research Data Commons, Canberra, Australia, adrian.burton@ardc.edu.au

2CSIRO Data61, Canberra, Australia, Carsten.Friedrich@data61.csiro.au

3IMOS, Hobart, Australia, sebastien.mancini@utas.edu.au

4CeRDI Federation University, Ballarat, Australia, b.simons@federation.edu.au

5ANU, Canberra, Australia, lesley.wyborn@anu.edu.au

 

DESCRIPTION

Data services have become an integral part of the research, government and industry sectors. They provide automated functions for the creation, access, processing and analysis of data. The development of data-focused services is steadily increasing in Australia, for example in the NCRIS capabilities (e.g., AuScope, IMOS, TERN, AURIN, ALA, NCI), CSIRO and government agencies (e.g., GA, Department of Environment, and ABS); all are moving to more formal publishing of data through services.

Properly deployed, standards conformant web services should enable cross domain discovery and in-situ programmatic access to process from multiple distributed sources. However, there are three fundamental issues that are currently impeding a more efficient use of data services in Australia:

  1. Findability and accessibility – a lack of consistency in service descriptions that makes it hard to discover data services and  action them;
  2. Interoperability and reusability – a lack of, or variable implementation of, standard protocols and information models that make it hard to aggregate identical data types from multiple sources; and
  3. Agreement on which data services standard to implement for a particular dataset.

This results in at least 4 approaches:

  1. Data from distributed resources are centralised (cached) in a single locality, harmonised and then made accessible via services from that central location;
  2. Data providers and/or facilities being requested to support an unsustainable number of protocols and standards;
  3. Data providers being asked to provide a custom modification to an individual specific service so that the data set can be accessed by a specific community; and
  4. Data services are idealistically provided by multiple sources and conform to a widely used, internationally agreed standards and can be sustainably accessed for many and varying use cases.

To make data services FAIRer and improve interoperability across multiple domains, for multiple use cases, ARDC has been organising two parallel activities:

  1. We formed a focus group with members from the NCRIS capabilities (ALA, AuScope, IMOS, TERN, NCI and a nascent Agriculture capability) and government agencies (e.g., CSIRO,  GA, BoM), that are working specifically on the ARDC funded Data Enhanced Virtual Laboratories (DeVL) and Research Data Cloud (RDC) Projects in  GeoScience, Marine Science, EcoScience, Climate and Agriculture.  The group discusses standardisation of data services description and APIs across these projects, with primary focus on data services that are compliant with a collection of OGC standards, OPeNDAP protocols, THREDDS data servers and GeoNetwork catalogues.

Based on community agreed service descriptions and an API, the ARDC Services team is developing a national service registration and discovery layer for both service providers and service consumers (Figure 1).  The discovery layer will address the findability issue and should provide a one stop for data consumers to search for and access data services offered across NCRIS facilities, universities, science agencies and government data providers that are participating in these particular DeVL and RDC projects. The interoperability and accessibility issues will be addressed by the community of data providers and consumers converging on common practice.

  1. We started a wider Australian Data Services Interest Group by facilitating discussion, exchanging information and experience of data services development across a broader range of Australian communities, including those involved in developing international standards for data service description and access. This interest group meets every three months and intends to take the lessons learnt from the more specific Focus Group and expand it to a wider community.

The two Interest Groups are in partnership with the Earth Systems Information Partners (ESIP) of the US, in particular the ESIP Information Interoperability and Technology Committee and the ESIP Data Stewardship Committee.  ESIP is supported by NASA, NOAA, USGS and 110+ member organizations.

Figure 1. Discovering Data Services through a Services Registry

We propose a 60-minute BoF session. The session provides a venue for a face-to-face meeting of the interest group, and also enables us to involve people from the wider community.  The BoF will include an introduction to data services, associated standards and effort we have made so far to make data services FAIRer.  We also invite people from Agriculture, Geoscience and Marine Science to introduce their implementation of data services.


Biographies:

Adrian Burton is Director of Services at the Australian National Data Service. Adrian has provided strategic input into several national infrastructure initiatives,  is active in building national policy frameworks to unlock the value in the research data outputs of publicly funded research.

Carsten Friedrich is a Research Team Leader at CSIRO Data61.  At CSIRO he worked in a variety of areas including Cloud Computing, Cyber Security, Virtual Laboratories, and Scientific Software Registries.

Bruce Simons has 24 years geophysical surveying and interpretation, 17 years UML data modelling, and XML/GML schema development to implement interoperable network services using XML markup and OGC web services to enable schematic and semantic interoperabilty.

Lesley Wyborn currently has a joint adjunct fellowship with NCI.  She is Chair of the Australian Academy of Science ‘Data for Science Committee’ and on the AGU Data Management Advisory Board and the Steering Committee of the AGU-led FAIR Data Publishing Project.

Turning research data management projects into business as usual and improved data management across the enterprise: the La Trobe experience

Dr Andrew Williams1, Ms Eva Fisch1, Ms Rachel Salby1

1La Trobe University, Bundoora, Australia, a.williams3@latrobe.edu.aue.fisch@latrobe.edu.aur.salby@latrobe.edu.au

 

La Trobe University is nearing the end of a period of intense, project-based change in the research data management space. Outcomes from recently completed projects at La Trobe include a research data management planning tool, an electronic lab notebook, Figshare, and platforms for publication of surface science and RNA sequence data.

We are very conscious that, while research data management systems are successfully delivered by projects, real improvements in actual management of research data can only be realised with a coordinated approach to communications and change and to supporting researchers in improved research data management practice.

As we transition from projects to business as usual, dedicated project staff are returning to their substantive roles and project knowledge is at risk of dispersing. It is clear that one shot training sessions, expecting generalist research support librarians to be technical experts, and relying on project documentation won’t be enough to stand up ongoing support.

We are using several approaches to transfer knowledge to the teams who will be responsible for supporting these systems:

  • storage of documented knowledge in places that are accessible and searchable by support staff
  • secondment of support staff into project roles while opportunities are available as a strategy to upskill them in research data management
  • actively working to transfer knowledge with hands-on systems training in the concluding phases of the projects
  • encouraging support staff to volunteer to be champions and ambassadors for systems
  • internal communities of practice and discussion groups focussed on research data management issues.

We feel that support staff need to feel they have ownership and investment in the transition to make it succeed, and are working to create that.

Finally, we are also looking to convene University governance for research data management that will ensure support is coordinated across a number of providers.

This presentation will provide a detailed case study of the ways La Trobe University is transitioning from multiple projects to an enterprise-wide, business as usual support for improved management of research data.


Biography:

Rachel provides expertise to help develop research data management training, support research data management systems, and to plan and execute support for research data management processes, including the transfer of skills and expertise, from enterprise research data management systems projects to the library research team.

Making data access easier with OPeNDAP

Adrian Burton1, Ben Evans3, Justin Freeman4, Gareth Williams5,James Gallagher2Duan Beckett4Kate Snow3Robert Davy5Mingfang Wu1

1Australian Research Data Commons, Canberra, Australia, adrian.burton@ardc.edu.au

2OPeNDAPTM

3National Computational Infrastructure, Canberra, Australia, Nigel.Rees@anu.edu.au

 4Bureau of Meteorology, Melbourne, Australia, justin.freeman@bom.gov.au  

5CSIRO, Melbourne, Australia, Gareth.Williams@csiro.au

 

DESCRIPTION

When more and more data are collected and made discoverable and available, there is a requirement of making data easily accessible. Accessing data through a downloadable URL from the web is convenient for small data, but not so for big data set, slicing a data set from a huge data collection, or assembling a dataset from multiple data sets in different data format.  OPeNDAP (Open Source Project for a Network Data Access Protocol) provides a framework for making scientific data available to remote consumers via the web. It is also a software framework that simplifies all aspects of data networking, allowing simple access to remote data.  Data providers can build their data provision server on top of the OPeNDAP framework or deploy existing solutions such THRREDDS, Hyrax, ERDDAP or PyDAP to make their data accessible, no matter data is stored in CSV, HDF or NetCDF files, in databases or another other formats.  While data consumers can virtually access data from custom built OPeNDAP such as NSA Earthdata search or any general tools such as R, Python, MATLab, or ArCGIS that support web access.

This 60 minutes will feature presentations from BOM, NCI, IMOS, and CSIRO on their OPeNDAP applications. The BoF is open for discussion of latest tooling, standard/vocabularies, any DAP-based data-retrieval-access architectures, science applications, and FAIR for DAP among many other topics. We will also gather community’s interaction for future actions such as organising a proper set of workshops.

We are also in partnership with the Earth Systems Information Partners (ESIP) of the US to form OPeNDAP community, in particular the ESIP Information Interoperability and Technology Committee and the ESIP Data Stewardship Committee.  ESIP is supported by 110+ member organizations including OPeNDAP, Unidata and HDF.


Biography:

Adrian Burton is Director of Services at the Australian National Data Service. Adrian has provided strategic input into several national infrastructure initiatives, is active in building national policy frameworks to unlock the value in the research data outputs of publicly funded research.

Ben Evans is associated Director of Research Engagement and Initiatives.

Peter Blain is a project leader, information systems architect, cognitive scientist and entrepreneur.

Justin Freeman is high performance computing Application Specialist at Bureau of Meteorology.

Gareth Williams leads a small team of Data Intensive Computing specialists in CSIRO’s Scientific Computing support group.

About the conference

eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.

© 2017 - 2018 Conference Design Pty Ltd