Turning research data management projects into business as usual and improved data management across the enterprise: the La Trobe experience

Dr Andrew Williams1, Ms Eva Fisch1, Ms Rachel Salby1

1La Trobe University, Bundoora, Australia, a.williams3@latrobe.edu.aue.fisch@latrobe.edu.aur.salby@latrobe.edu.au


La Trobe University is nearing the end of a period of intense, project-based change in the research data management space. Outcomes from recently completed projects at La Trobe include a research data management planning tool, an electronic lab notebook, Figshare, and platforms for publication of surface science and RNA sequence data.

We are very conscious that, while research data management systems are successfully delivered by projects, real improvements in actual management of research data can only be realised with a coordinated approach to communications and change and to supporting researchers in improved research data management practice.

As we transition from projects to business as usual, dedicated project staff are returning to their substantive roles and project knowledge is at risk of dispersing. It is clear that one shot training sessions, expecting generalist research support librarians to be technical experts, and relying on project documentation won’t be enough to stand up ongoing support.

We are using several approaches to transfer knowledge to the teams who will be responsible for supporting these systems:

  • storage of documented knowledge in places that are accessible and searchable by support staff
  • secondment of support staff into project roles while opportunities are available as a strategy to upskill them in research data management
  • actively working to transfer knowledge with hands-on systems training in the concluding phases of the projects
  • encouraging support staff to volunteer to be champions and ambassadors for systems
  • internal communities of practice and discussion groups focussed on research data management issues.

We feel that support staff need to feel they have ownership and investment in the transition to make it succeed, and are working to create that.

Finally, we are also looking to convene University governance for research data management that will ensure support is coordinated across a number of providers.

This presentation will provide a detailed case study of the ways La Trobe University is transitioning from multiple projects to an enterprise-wide, business as usual support for improved management of research data.


Rachel provides expertise to help develop research data management training, support research data management systems, and to plan and execute support for research data management processes, including the transfer of skills and expertise, from enterprise research data management systems projects to the library research team.

Making data access easier with OPeNDAP

Adrian Burton1, Ben Evans3, Justin Freeman4, Gareth Williams5,James Gallagher2Duan Beckett4Kate Snow3Robert Davy5Mingfang Wu1

1Australian Research Data Commons, Canberra, Australia, adrian.burton@ardc.edu.au


3National Computational Infrastructure, Canberra, Australia, Nigel.Rees@anu.edu.au

 4Bureau of Meteorology, Melbourne, Australia, justin.freeman@bom.gov.au  

5CSIRO, Melbourne, Australia, Gareth.Williams@csiro.au



When more and more data are collected and made discoverable and available, there is a requirement of making data easily accessible. Accessing data through a downloadable URL from the web is convenient for small data, but not so for big data set, slicing a data set from a huge data collection, or assembling a dataset from multiple data sets in different data format.  OPeNDAP (Open Source Project for a Network Data Access Protocol) provides a framework for making scientific data available to remote consumers via the web. It is also a software framework that simplifies all aspects of data networking, allowing simple access to remote data.  Data providers can build their data provision server on top of the OPeNDAP framework or deploy existing solutions such THRREDDS, Hyrax, ERDDAP or PyDAP to make their data accessible, no matter data is stored in CSV, HDF or NetCDF files, in databases or another other formats.  While data consumers can virtually access data from custom built OPeNDAP such as NSA Earthdata search or any general tools such as R, Python, MATLab, or ArCGIS that support web access.

This 60 minutes will feature presentations from BOM, NCI, IMOS, and CSIRO on their OPeNDAP applications. The BoF is open for discussion of latest tooling, standard/vocabularies, any DAP-based data-retrieval-access architectures, science applications, and FAIR for DAP among many other topics. We will also gather community’s interaction for future actions such as organising a proper set of workshops.

We are also in partnership with the Earth Systems Information Partners (ESIP) of the US to form OPeNDAP community, in particular the ESIP Information Interoperability and Technology Committee and the ESIP Data Stewardship Committee.  ESIP is supported by 110+ member organizations including OPeNDAP, Unidata and HDF.


Adrian Burton is Director of Services at the Australian National Data Service. Adrian has provided strategic input into several national infrastructure initiatives, is active in building national policy frameworks to unlock the value in the research data outputs of publicly funded research.

Ben Evans is associated Director of Research Engagement and Initiatives.

Peter Blain is a project leader, information systems architect, cognitive scientist and entrepreneur.

Justin Freeman is high performance computing Application Specialist at Bureau of Meteorology.

Gareth Williams leads a small team of Data Intensive Computing specialists in CSIRO’s Scientific Computing support group.

Identifying, connecting and citing research with persistent identifiers

Natasha Simons1, Andrew Janke2, Jens Klump3, Lesley Wyborn4, Adrian Burton5, Siobhann McCafferty6, Gerry Ryder7

1Australian Research Data Commons, Brisbane, Australia, natasha.simons@ardc.edu.au

2National Imaging Facility, Centre for Advanced Imaging, UQ, Brisbane, Australia, andrew.janke@uq.edu.au

3CSIRO Mineral Resources, Perth, Australia, jens.klump@csiro.au

4National Computational Infrastructure, Canberra, Australia, lesley.wyborn@anu.edu.au

5Australian Research Data Commons, Canberra, Australia, adrian.burton@ardc.edu.au

6Australian Access Federation, Brisbane, Australia, siobhann.mccafferty@aaf.edu.au

7Australian Research Data Commons, Adelaide, Australia, gerry.ryder@ardc.edu.au



Increasingly, the research community, including funders and publishers, is recognising the power of ‘connected up’ research to facilitate reuse, reproducibility and transparency of research. Persistent identifiers (PIDs) are critical enablers for identifying and linking related research objects including datasets, people, grants, concepts, places, projects and publications.   PID systems:

  • Provide social and technical infrastructure to identify and cite a research output over time
  • Enable machine readability and exchange
  • Collect and make available metadata that can provide further context and connections
  • Facilitate the linkage and discovery of research outputs, objects, related people and things

Join this BoF to learn about recent developments in PID services and infrastructure with a particular focus on DOI (research data), ORCID (people and organisations), RAID (research activities and projects) and IGSN (physical samples and specimens).

Find out how to maximise the return on your investment in PIDS through participation in global initiatives such as Scholix and the Research Data Switchboard which use PIDS to offer researchers, and research institutions a richer, more connected experience.


This BoF will be of interest to those implementing, maintaining and supporting PID services including repository managers, developers and librarians. Participants should come along prepared to exchange knowledge, share experiences and contribute to discussions about optimising the ‘power of PIDs’.


The session will kick off with brief lightning talks presented by those working at the cutting edge of global developments in PID services and infrastructure.  Following facilitated Q&A, participants will be encouraged to contribute to an open discussion to share experiences, explore ideas and ask questions.


Participants will leave the BoF with a fresh perspective on the opportunities PIDs can offer researchers and research organisations.  We envisage that many participants will be prompted to explore in greater depth, ideas raised during the session as they might apply to their organisation.
The BoF will also offer participants the opportunity to establish or strengthen connections with the broader PID community in Australia and internationally.


Natasha Simons is Program Leader, Skills Policy and Resources with the Australian National Data Service.

IGSN: a persistent identifier for physical samples

Adrian Burton1, Jens Klump2, Lesley Wyborn3, Gerry Ryder4

1Australian Research Data Commons, Canberra, Australia, adrian.burton@ardc.edu.au

2CSIRO Mineral Resources, Perth, Australia, jens.klump@csiro.au

3National Computational Infrastructure, Canberra, Australia, lesley.wyborn@anu.edu.au

4Australian Research Data Commons, Adelaide, Australia, gerry.ryder@ardc.edu.au



The International Geo Sample Number (IGSN) is designed to provide an unambiguous globally unique persistent identifier for physical samples.  It facilitates the location, identification, and citation of physical samples used in research.  While applicable to any type of physical sample, impetus for the IGSN has come largely from the earth science community where IGSN are assigned to geologic and environmental samples such as rocks, drill cores, soils, water and gas as well as related sampling features such as sections, dredges, wells and drill holes.

The IGSN system is underpinned by the Handle System and is governed by an international organisation, the IGSN Implementation Organization e.V.


There are numerous examples of the fundamental role persistent identifiers play in the global sharing of information, resources and objects.  The DOI is one widely known example, while others such as ORCID are rapidly gaining traction in the research community.

Assigning IGSN to samples:

  • facilitates the discovery, access, sharing and citation of samples
  • supports preservation and access of sample data
  • aids identification of samples in the literature
  • supports tracking of samples across laboratories and sample storage
  • advances the exchange of digital sample data among interoperable data systems, for example by enabling a sample to be linked to the:
    • data derived from it
    • literature where the sample and data are interpreted
    • curator or collector of the sample.


There are four agencies in Australia implementing IGSN.  All have taken up membership of IGSN e.V. to become IGSN allocating agents for identified stakeholder groups that collect or curate earth science samples for research.

  • Curtin University: allocating agent for Curtin University facilities, staff and HDR students
  • CSIRO: allocating agent for CSIRO facilities and staff
  • Geoscience Australia: allocating agent for Geoscience Australia facilities and staff, and those associated with State Geological Surveys
  • Australian Research Data Commons (ARDC): allocating agent for University staff and those working in publicly funded research organisations not covered above


The ARDC IGSN service was developed in collaboration with AuScope as a key component of the Geoscience Data Enhanced Virtual Laboratory (GeoDEVL) project.  Released in July 2018, the ARDC IGSN service currently:

Criteria for using the ARDC IGSN service:

  • six mandatory metadata elements required for IGSN registration must be provided at the time of registration. Providing additional descriptive metadata will increase the potential for discovery, reuse and citation of the registered sample
  • the sample being identified should be associated with an Australian research activity
  • IGSN identifiers should resolve to a metadata record describing the sample
  • the sample being identified, and associated metadata, should be curated through the research and sample lifecycle


While the scope of the ARDC IGSN service is currently limited to earth science samples, the ARDC is interested in working with other communities in order to extend the service for use with other physical sample types such as vegetation, archaeological and biological specimens.  It is anticipated that development work to extend the service will commence Q1 2019 and the ARDC welcomes enquiries from prospective users.

Worth noting is that IGSN e.V. recently secured a Sloan Foundation Grant to enable further development of IGSN technical infrastructure and governance.  The ARDC will have an active role in this project which represents a significant investment in future sustainability of the IGSN system.


Dr Adrian Burton is Director, Services with the Australian National Data Service

Reviving an old and valuable collection of microscope slides of physical samples through the use of Citizen Science

Mr John Pring1, Dr Lesley Wyborn2, Mr Neal Evans1

1Geoscience Australia, Canberra, Australia, john.pring@ga.gov.au, neal.evans@ga.gov.au

2Australian National University, Canberra, Australia, lesley.wyborn@anu.edu.au


The importance of Australia’s mineral wealth has been well recognised since at least Federation in 1901, however the perceived importance and value of the underlying data has fluctuated.

Through successive agencies the Australian Federal Government has collected a considerable quantity of physical samples and data over the last 100 years including historically significant samples, many of which cannot be replaced as the source locations are no longer accessible.  One of the more valuable collections now hosted by Geoscience Australia (GA) comprises 250,000+ microscope slide thin sections of these physical samples collected during hundreds of field mapping campaigns from across Australia, Papua New Guinea, Antarctica and beyond.

Figure 1: BMR Field Camp 1956

With the progress of time and technology, and the inherent human nature to only access things readily available, the largely paper based management system for the slide collection has seen the use of this public collection greatly reduced since its heyday in the latter half of the 20th century.

GA initiated a project to rescue the microscope collection and its metadata. Much of metadata was recorded on hand written cards or log books, and this needed to be captured and then updated to be compatible with current GA online management systems. With the tight fiscal constraints on the agency, there were insufficient geoscience experts available for the task, and this necessitated that the approach to capturing the large quantify of card and register based information in a usable digital form be done in a non-traditional way. The project decided to make extensive use of the DigiVol [1] citizen science portal to initially transcribe the paper based records, letter for letter, number for number using citizen scientists with no geological expertise.

However, because of the age of the collection, it was not just a simple matter of transcribing handwritten data and then making this information available as is. The legacy information had to be updated if it was it to be reusable and compatible with modern GA corporate databases, particularly for content that now follows international standards and specifications for digital data that were not in existence when the original samples and descriptive information was collected. A few subject matter experts (SMEs) (including volunteer retirees who collected some of the material) were then involved in a consultative manner for the data validation stage. Firstly, the location of each sample needed to be translated into modern datums and spatial referencing techniques. Some of the locations needed to be retrieved from pin holes in air photographs or text-based location descriptions (eg “Fullerton Gully 3.5M S.S.E. Gurrumba”[2]).  Because of uncertainty of many of the locations, care was also taken to record the accuracy of the position, which in some cases was +/- some kilometers. Secondly, the SMEs provided valuable expertise to help update the information to modern standards so that it could be seamlessly integrated into the GA databases. Once there, it will be possible to make this legacy data available to industry, research and the general public through the current GA data access mechanisms.

This combination of using citizen scientists to do the valuable initial transcription, made much more effective use of the few SMEs available to the project. The SMEs simply had to focus on improving the quality of the information and providing consulting support to the citizen scientists.

This presentation will explore the approach taken by Geoscience Australia and the benefits to the organisation, the roles of citizen science participants (without whom this legacy collection would not have been made accessible), and the untapped potential for this valuable new data collection.


  1. DigiVol citizen science transcription site available from https://volunteer.ala.org.au/ accessed 21 June 2018
  2. Geoscience Australia Rock Register #2, page 34 (Reg No. 16583)


John Pring holds a Masters of Management Studies (Project Management/Technology and Equipment) from the University of New South Wales and an Electrical Engineering Degree from the University of Southern Queensland.

He has been Senior Project Manager within the Environmental Geoscience Division of Geoscience Australia for some 10 years and has run a number of projects associated with the management of the agencies data and physical collections over that time.

He has held similar roles within other government agencies prior to joining Geoscience Australia.

CSIRO Knowledge Network: supporting tailored data discovery and access

Jonathan Yu1, Benjamin Leighton2, Jevy Wang3, Hendra Wijaya4

1CSIRO L&W, Clayton, VIC, Australia, jonathan.yu@csiro.au

2CSIRO L&W, Clayton, VIC, Australia, ben.leighton@csiro.au

3CSIRO L&W, Black Mountain, ACT, Australia, jevy.wang@csiro.au

3CSIRO/Data61, North Ryde, NSW, Australia, hendra.wijaya@csiro.au


Discovery and access of data to support research projects and policy analysis is currently limited. While, many services are increasingly publishing data, for researchers and policy analysts, these are not easily discoverable and accessible, not comprehensive and not linked with tools and approaches that promotes their use. On the other hand, data providers are often disconnected with user groups and lack the ability to capture, attribute and accrue value to justify further business cases in improvements to allow the data to be more discoverable, accessible, interoperable and reusable. Therefore, this is a barrier that limits the ability to develop repeatable and evidence-based policy analysis and research in Australia.

CSIRO is developing the Knowledge Network (KN) platform (https://kn.csiro.au), which provides a gateway to data published via range of data initiatives, including NCRIS and open government data initiatives. KN harvests and indexes known data records from multiple data repositories in government and research. This is then made available to allow anyone to discover, access and share links to data at the collection level and at the individual file or service level all in the one platform.

By having datasets and file level information available in the KN platform, it provides opportunities for researchers to leverage these in online platforms, including data analytics environments (e.g. virtual laboratories or science gateways), as well as web applications tailored for specific communities. KN is currently being used in the ‘EcoScience Research Data Cloud and Data Enhanced Virtual Laboratory’ project (ecocloud for short) [1] to enable discovery and access to third-party data for use with the ecocloud compute platform. In particular, KN is powering discovery and access via the ecocloud explorer which displays a tailored set of search results of data relevant to the ecological science domain. This then allows ecocloud users, such as researchers or policy analysts, to discover and access relevant data in the ecocloud explorer, and provide code snippets for its use in ecocloud compute environments. However, the current APIs provide means for other projects and initiatives to provide a tailored view of data from a comprehensive superset which aims to have national coverage.

As information about the dataset and file level metadata is also indexed in KN, this provides opportunities for developing quantitative surveys of the data landscape, particularly in Australia to enable analysis and report on current state [2,3]. By understanding the current state of the data landscape, it allows greater data-driven insight and understanding of trends and gaps in data initiatives in general over time based on the metadata and datasets themselves. Specifically, it allows for a data-driven picture of emerging trends of topics and activities for specific scientific/research communities as well as public and private sector-based agencies. This then allows opportunities for assessment of improvements in future initiatives based on data-driven insights.

In this presentation, we provide an overview of the KN technical architecture, its use in a virtual laboratory context, and a discussion around data-driven insights that can be gained from the KN platform to inform a ‘state of the data’ picture for Australia.


  1. EcoCloud, https://www.ecocloud.org.au, accessed 20 June 2018
  2. Yu, J., et al., Survey of open data and research data in the Australian context via the CSIRO Knowledge Network, eResearch Australasia, Brisbane, Australia, October 2017
  3. Yu, J., et al., Visualising the Australian open data and research data landscape, Collaborative Conference on Computational and Data Intensive Science, 2018 (C3DIS 2018), Melbourne, Australia, May 2018, DOI: 10.13140/RG.2.2.33826.32964


Dr Jonathan Yu is a data scientist researching information and web architectures, data integration, Linked Data, data analytics and visualisation and applies his work in the environmental and earth sciences domain. He is part of the Environmental Informatics group in CSIRO Land and Water. He currently leads a number of initiatives to develop new approaches, architectures, methods and tools for transforming and connecting information flows across the environmental domain and the broader digital economy within Australia and internationally.

Changes in national ethics policy for managing and sharing human research data

Kate LeMay 1

1Australian Research Data Commons, Canberra, Australia, kate.lemay@ands.org.au


There is a strong national and international movement from both funders and publishers of research, and in particular medical research, towards requiring digital data outputs of research to be well managed and available for appropriate reuse by other researchers. Institutional ethics policies also play a key role in determining how long and where data should be retained, and if/how it can be shared. These ethics policies are based upon the National Statement of Ethical Conduct in Human Research, which is owned by the NHMRC.

This session will examine the new version of the National Statement on Ethical Conduct in Human Research, and ways in which institutions, ethics committees and researchers can comply with the new requirements for data management and sharing.

Managing access to shared data

The Five Safes[1] framework for managing access to data is an excellent basis for planning to manage access to sensitive data. By considering the five aspects of projects, people, data, settings and outputs it addresses the risks in each of these areas and provides choices for a variety of ways to manage access.

Research data can be openly described in data repositories, without making the data openly available., which is called mediated access. This concept is consistent with the approach of making data Findable, Accessible, Interoperable and Reusable (FAIR)[2], and can be part of the Five Safes framework. There are many ways of mediating access to sensitive data, and some examples will be given in this session.


Sufficient and voluntary consent for data sharing is vital. Controls around governance, access, use, release, confidentiality and privacy of the data should be made clear during the ethical approval process, and also to participants in the research when obtaining consent. Appropriate consent must be obtained from participants for the reuse of research data. Strategies to incorporate data sharing into the ethical approval and consent processes will be discussed.

When research data is reused it must comply with the consent agreement originally formed with the participants. It may be appropriate to provide levels of consent to the participants, e.g. levels of identifiability or aggregation of their data being made available for reuse.

Often researchers are concerned that participants will not consent to their research if they ask for permission to share the data after the conclusion of the project. However, there is a growing body of research around positive participant attitudes towards data about them, even medical data, being reused for research purposes.


The management, retention, and appropriate sharing of research data is increasingly recognised as an important part of the research lifecycle. This is being recognised in national policies, such as the new version of the National Statement on Ethical Conduct in Human Research. Ways in which institutions and researchers can appropriately manage and share human research data will be outlined.


  • Desai, T. Ritchie, F., and Welpton, R. Five Safes: designing data access for research. 2016. DOI: 10.13140/RG.2.1.3661.1604
  1. FAIR data. Available from: http://www.ands.org.au/working-with-data/fairdata, accessed 30 May 2018.


Kate LeMay began her career as a Pharmacist, working in both community and hospital settings. She moved on to the University of Sydney and Woolcock Institute of Medical Research, where she worked on community pharmacy based programs to assist patients with chronic disease management. Kate is now in Canberra, Australia, at the Australian Research Data Commons (ARDC) as a Senior Research Data Specialist, focusing on health and medical data.

Medical Imaging: Federation and Compute

Chris Albone1, Ryan P Sullivan1,2

1Information and Communications Technology, University of Sydney, Sydney Australia

2Core Research Facilities, DVC-R, University of Sydney, Sydney Australia

chris.albone, ryan.sullivan@sydney.edu.au



XNAT is an imaging data platform that has been rapidly gaining popularity throughout Australian research institutions and facilities, and worldwide [1]. It has been adopted as part of the National Imaging Facilities (NIF) Trusted Data Repository (TDR) program to provide a standard framework on medical imaging and data provenance.

Similar efforts are underway on the computational component with the Characterization Virtual Labs (CVL) under the Data Enhanced Virtual Lab (DeVL) program funded by NRDC, providing a workbench dedicated to neuroimaging. NIF@UQ has also been working on a DICOM2Cloud project to facilitate automated anonymization of data for computation on public cloud environments.

The University of Sydney is using XNAT as a key component of our Imaging Data Service, and have combined it with compute on our HPC, and VRD, as well as the CVL and GUI informatic pipeline platforms. We are also a participant in the C-DeVL program developing a windows version of CVL workbenches. Research is inherently multi-institutional, and projects will be spanning multiple repositories and computation infrastructure. We would like to raise the natural question of federation of these aligned projects.


We propose a 60 min roundtable with representatives of institutions with XNAT, or who are looking at deploying XNAT systems. The roundtable will discuss the following:

  • What is the current status of deployments? Plans for the immediate future. (20 min)
  1. What might XNAT federation look like? Federated metadata search? Federated data search? (15 min)
  2. CVL is being federated. What about other characterization and informatics workflow platforms? Shared repository of Singularity/Docker pipelines to use in XNAT and/or HPC? (15 min)
  3. Should a standard anonymization toolset be adopted when transferring between these repositories and centers of compute? (10 min)


Dr Sullivan is a biophysicist with an interest in neural implants. His research led him into software development for automatic characterization of implants and neural tissue. Dr Sullivan joined the University of Sydney in 2017 where he now works on eResearch projects focusing on characterization domains.

Imaging Data Service: Ingestion, Storage, and Compute

Ryan Sullivan1, Haofei Feng2, Vipul Patel3, Murray-Luke Peard4, Chris Albone5

1University of Sydney, Sydney, ryan.sullivan@sydney.edu.au

2University of Sydney, Sydney, haofei.feng@sydney.edu.au

3University of Sydney, Sydney, vipul.patel@sydney.edu.au

4University of Sydney, Sydney, murray-luke.peard@sydney.edu.au

5University of Sydney, Sydney, chris.albone@sydney.edu.au



Imaging research, clinical, preclinical, or otherwise, is often multisite, multimodal, and compute intensive. XNAT is an imaging data platform that has been rapidly gaining popularity both worldwide and throughout Australian research institutions and facilities [1]. As part of the University of Sydney’s Core Research Facility program, we have developed our Imaging Data Service (IDS) using XNAT as one of the core technologies. IDS is able to ingest, store, and analyse data in an automated and compliant manner to facilitate clinical workflows.

We have connected instruments in the Sydney Imaging Core Facility and I-Med, a local clinical site. We will be expanding to cover instruments in three schools along with additional clinical sites over the coming year. We will discuss challenges we’ve encountered in terms of developing these systems, as well as hurdles in dealing with patient privacy and vendor software.


Acquired images are passed directly from equipment to a Clinical Trials Processor (CTP) or Research Automated Project Allocator & Anonymiser (RAPPA) on site, where direct patient identifiers are stripped in a compliant manner before the data is sent to the XNAT repository. The direct identifiers are stored on-site in such a way that allows automated re-association of derivative data and analysis results on site to facilitate clinical workflows. Other patient data not captured at the instrument are stored in a separate REDCap system, linked with a common anonymised key. This allows a higher granularity of control to address the different needs of a variety of projects and sites based on patient consent. Data from other repositories, such as historical data on our Research Data Share (RDS), may also be batch uploaded to the new system.

Once stored in XNAT, researchers may access their data using AAF authentication via web browser or though multiple clients and connected platforms using the REST API. We have implemented XNAT’s pipeline engine using containerised workflows run on Artemis, our HPC, as the backend, with the future aim of being able to run on private and public clouds. Alternatively, they may use resources such as Argus, Sydney’s Virtual Research Desktop, or the ARDC’s curated Characterization Virtual Lab (CVL). Finally, we look at integration with two informatics platforms, Jupyter Hub and Nipype, through which workflows may be developed. This gives researchers the freedom to choose the desired technologies for their particular workflows.

From the user’s perspective, this provides a “big green button” solution to analysing their data using tested and curated pipelines, while also providing tools for power users who wish to delve deeper into informatics development.

Figure 1: High level overview of data flow in our Imaging Data Service. Orange are systems belonging to the University of Sydney directly, Blue belong to partner institutions, Green belong to NCRIS capabilities. Partially transparent items are planned over the coming year, but not yet in production.


We continue to look at developing sustainable DevOps frameworks for pipelines to allow the system to be self-sustaining and get ICT “out of the way.”  Next steps are continued rollout to appropriate faculties, improved auditing and reporting frameworks for research integrity, operations ROI, and data provenance. We are also interested in discussing interfacing with other similar systems meeting the TDR standard.


  1. Marcus, D. S., Olsen, T. R., Ramaratnam, M., Buckner, R. L., The extensible neuroimaging archive toolkit. Neuroinformatics, 2007. DOI: 10.1385/NI:5:1:11


Dr Sullivan is a biophysicist with an interest in neural implants. His research led him into software development for automatic characterization of implants and neural tissue. Dr Sullivan joined the University of Sydney in 2017 where he now works on eResearch projects focusing on characterization domains.

Collecting and publishing dataset usage and citations at the ALA

Nick dos Remedios1, Javier Molina 2, Simon Bear 3, Patricia Koh4

1Atlas of Living Australia, Canberra, Australia, nick.dosremedios@csiro.au

2Atlas of Living Australia, Canberra, Australia, javier.molina@csiro.au

3Atlas of Living Australia, Canberra, Australia, simon.bear@csiro.au

4Atlas of Living Australia, Canberra, Australia, patricia.koh@csiro.au


The Atlas of Living Australia (ALA) [1] is an NCRIS-funded national biodiversity data aggregator. Founded on the principle of open data sharing – collect it once, share it, use it many times – the ALA provides free, online access to over 70 million occurrence records to form the most comprehensive and accessible dataset on Australia’s biodiversity ever produced. The dataset owners and providers are an important stake holder group for the ALA and one of the benefits of sharing their data with the ALA is we are able to provide data usage and citation statistics back to them. Each dataset has a metadata web page on the ALA that provides details about the institution, research, contacts and description for that dataset. On this page, there is a detailed breakdown of how many user-generated downloads contained records from their dataset, covering the past month, 6 months, 12 months and all-time. Recently the ALA has added a new feature to the data downloads, whereby a DOI is automatically generated for every user download event. Researchers are encouraged to link this data DOI to any publication DOI that uses this data. In addition, ALA has collaborated with the Global Biodiversity Information Facility (GBIF) [2] to allocate a DOI to a large percentage of datasets with the aim of covering all datasets in the near future. By using citation linking tools, download DOIs can be linked to their dataset DOIs and thus it will be possible to track publications via the DOI chain back to each dataset.


  1. Atlas of Living Australia (ALA) – https://www.ala.org.au/
  2. Global Biodiversity Information Facility (GBIF) – https://gbif.org/


Nick competed a PhD in comparative immunology at UTS before taking up software development in the airline industry. He then worked for a  IP focused, not-for-profit research NGO called CAMBIA before taking up a role as senior developer at the Atlas of Living Australia (CSIRO) where he now works.

Recent Comments


    About the conference

    eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

    Conference Managers

    Please contact the team at Conference Design with any questions regarding the conference.

    © 2018 - 2020 Conference Design Pty Ltd