Collaborative Research in Practice – a roadmap to using Git and Docker

Convener: Rebecca Lange
Presenters: Dr Rebecca Lange1, Mark Gray2, Brian Skjerven2

1Curtin Institute for Computation, Curtin University, Perth, Australia,
2Pawsey Supercomputing Center, Perth, Australia,
3Pawsey Supercomputing Center, Perth, Australia,


  • Workshop Length: 1 day
  • Hands-on component: Yes, we are aiming for half of the workshop being hands-on
  • Attendees: less than 40


The Internet has made it easy for researchers to collaborate across universities using tools such as emails, instant messaging and videoconferencing. However, these tools do not encourage transparency and reproducibility of research. For example, what would happen if you forget to put someone on your email chain and how do you reconcile different code snippets that various team members have worked on?

In this workshop we will first have a look at the modern scientific landscape in which code plays an increasingly important role in obtaining and reproducing research results.

We then move on to a hands-on introduction to version control (with Git and GitHub) and “containerising” with Docker. Finally, we will conclude the workshop with a session on using version control and Docker in practice for a collaborative and migrateable workflow.

1. Welcome, introductions and workshop outline.

The Importance of software in research and academia.  30 minutes

2. Introduction to version control with Git and Github – Part I. 60 minutes

  • What is version control
  • Setting up git
  • Creating your first repository

3. Break – 30 minutes

4. Introduction to version control with Git and Github – Part II. 90 minutes

  • Tracking changes
  • Exploring the repo history
  • Remote repositories on Github

5. Lunch Break – 60 minutes

6. Introduction to containers using Docker and Dockerhub. 60 minutes

  • What is Docker
  • Creating a Docker container
  • Running a Docker container
  • Containers on Dockerhub

7. Break – 30 minutes

8. A collaborative (software/coding) workflow in practice. 120 minutes

  • Working collaboratively on Github
  • Managing containers and dockerfiles with Git
  • Managing your workflows using containers
  • Where to next? Pointers to further reading and learning


This workshop is aimed at academics/researchers who regularly write and/or work with software, and research support staff who would like to know more abut useful tools for code and workflow management. We will cover:

  • how to track changes,
  • how to make sharing of code and software easier, and
  • how to manage issue tracking and workflows in a collaborative software development environment.


Attendees will need to bring their own laptop with some software pre-installed.

It would be beneficial if participants have used a bash terminal (or similar) before.

We will provide information on what to install and how to install closer to the time.


Rebecca Lange received her PhD in astronomy from the International Centre for Radio Astronomy Research at the University of Western Australia.

Before Rebecca moved to Australia she studied Astronomy and Physics at Nottingham Trent University where she also worked as a research assistant in scientific imaging for art conservation and archaeology. Her work there included the development and testing of instruments and software for imaging and spectroscopy as well as the organisation and supervision of field trips, which often required liaising with art curators and conservators.

Throughout her studies and research Rebecca has gained extensive programming as well as data analytics and visualisation experience in various programming languages.

Currently she is working as a data scientist for the Curtin Institute for Computation where she helps researchers by providing data analytics and computational support and training.


From biodiversity to infrastructure: lessons learned repurposing the Atlas of Living Australia technology stack for climate risk decision making

Mr Paul Box1, Jonathan Yu2, Andrew Freebairn3, Ashley Sommers4, Peter Brenton5, Mark  Stafford Smith3, Russ Wise3, Rachel Williams3

1Csiro, Sydney, Australia,

2CSIRO, Melbourne, Australia,

3CSIRO, Canberra, Australia,

4CSIRO, Brisbane, Australia,

5Atlas of Living Australia, Canberra, Australia


Governments, communities and the private sector face rising costs from natural disasters and climate change. Population growth coupled with the effects of climate change creates systemic risks affecting community health and wellbeing, the resilience of the Australian economy and government service delivery. A significant proportion of the social and financial costs of these risks flows back to the Commonwealth, in growing levels of disaster relief and as the ‘insurer of last resort”, but also by compromising areas of policy outcomes which undermines the public’s trust in Government, and ultimately through reduced tax revenues due to declines in national economic productivity and sector competitiveness. There is a wealth of data and information available on the current and future climate, natural hazards, and on ways to adapt to climate change. Decision makers, however, have said that this information is either not accessible, discoverable, or useable for decision-making or if it is they find it difficult to know which information is authoritative and trustworthy and which information to use when. Furthermore, there is currently little guidance on how to bring climate risk into existing business processes such as cost benefit analysis for infrastructure investment.


To address these challenges, CSIRO was commissioned by the Commonwealth Department of Environment and Energy (DoEE) to develop a prototype Climate Risk Information and Services Platform (CRISP). CRISP is intended to provide a trusted and sustainable mechanism to enable access to authoritative information, and guidance to facilitate planning and decision-making for a more climate-resilient Australia [1]. By providing best available information about climate risks, coupled with leading practice assessment processes, CRISP aims to influence and support key areas of Commonwealth decision-making to improve the nation’s resilience to climate risk.

A user-centered design approach applying the Digital Transformation Agency (DTA) Digital Service Standard was adopted. Two main suites of functions were identified through user engagement: a spatial data discovery and exploration tool to enable discovery and exploration of available spatial information and a configurable workflow specification tool enabling risk assessors and managers to work with climate adaptation scientists to develop climate risk decision making workflows that align with and can support or improve existing business processes. The primary motivating use case for CRISP identified through user design was enabling climate risk to be incorporated into physical infrastructure investment decision-making.

The project approach was informed by previous work in the climate adaptation community (including the CSIRO’s National Climate Adaptation Flagship, the National Climate Change Adaptation Research Facility, and others) that identified the proliferation of unsupported data and process tools as creating confusion for potential user communities [2]. Therefore, despite being an alpha prototype the project team was keen to ensure that the prototype, if possible, was built in a way that meant it could be scaled up to an operational system that could be sustained and supported long term rather than being a ‘throw away’ proof of concept.


The Atlas of Living Australia (ALA or Atlas) is an e-infrastructure that is funded by the Australian Government via its National Collaborative Research Infrastructure Strategy (NCRIS). It comprises a centralised web-based infrastructure to capture, aggregate, manage, discover and analyse biodiversity data and associated information, through a suite of tools and spatial layers for use by research, industry, government and the community. It is an open source technology stack with micro-services component architecture.

Based on identified user requirements, the Atlas of Living Australia was identified as a platform that provided much of the functionality required for CRISP and could be re-purposed for use in the CRISP project. Of particular interest was the data collection platform, BioCollect, which provides flexible and configurable project-centric forms-based capability to support user defined field data collection surveys. This tool also has a structure which can readily support extensible workflow requirements.


An agile software development approach was used to develop the alpha prototype in the first phase of the project. An instance of BioCollect was deployed with identified subset of micro services “dockerised” for deployment.

Climate risk workflows were rapidly prototyped in power-point, tested with users  and used as specifications for workflow configuration. However, the BioCollect information model underpinning the surveys, in it’s present form, proved to be insufficient to meet the needs of complex climate risk decision making workflows. In addition, there was a high transaction cost (‘repurposing technical debt’) in customizing BioCollect user interfaces in a rapid proof of concept, alpha prototype.

Consequently it was decided that a separate User Interface (UI)I in Java and HTML and underlying workflow persistence layer using MongoDB would be developed. These would be treated as high fidelity prototypes that would be tested with users and used to inform customization and extension of the BioCollect codebase in the next phase of the project – the beta prototype.


From this process there are a number of lessons that can be learned about how to: build and maximize potential for platforms reuse and repurposing; assess platforms for reuse; estimate potential hidden transaction costs (repurposing technical debt); manage risks through rapid prototyping and identifying key decision points along the development pathway described above.

This presentation will describe this journey in more detail, articulate some key lessons learned and implications for reuse and repurposing of eResearch infrastructure to maximize return on investment.


1 Australian Government, National Climate Resilience and Adaptation Strategy 2015. 2015, Canberra, Australian Capital Territory.

2 Webb, R. & Beh, J., Leading adaptation practices and support strategies for Australia: An international and Australian review of products and tools, National Climate Change Adaptation Research Facility, Gold Coast, 105 pp.


Paul works for CSIRO Land and Water and leads research into the social, institutions and economic dimension of public information infrastructure (or systems of systems). He is developing an inter-disciplinary ‘social architecture’ approach to understanding and designing conducive environments for infrastructure development.

This approaches are being applied in the environmental and other domains to enable efficient data supply and digital transformation.

Paul has a background in geospatial informatics and has been involved in the research, design and implementation of geospatial information infrastructure at global and national scales for the past 15 years.  Prior to joining CSIRO, Paul worked for nearly two decades in Asia and Africa for the UN and government, designing, implementing and managing geospatial capabilities to support sustainable development and humanitarian response.

Deployment Pipelines in eResearch: challenges and successes

Mr Nick Rossow1,  Heidi Perrett2Mr Jan Hettenhausen3

1Griffith University, Nathan, Australia


In the current research environment application development has become a critical element in research workflows. Many countries like Australia are investing heavily in not only hardware infrastructure but also the application platforms required to undertake research. Similarly, within institutions as research teams become large and data flows increase the need for software applications is increasing with the requisite level of software engineering practices.

Software Engineering practises have undergone a significant evolution over time with Agile methodologies currently being the dominant approach in many industries, including eResearch. One of the newer models under the umbrella of Agile approaches is continuous delivery, in which software is developed with the idea of being releasable at any time. In order to achieve this, continuous delivery heavily relies on automating testing and building and deploying the code in a pipeline-style process. Some of the benefits, in particular in the context of developing applications in close collaboration with researchers, are that projects can evolve more quickly whilst maintaining a high-quality standard of the product. These benefits also persist after the product is handed over to support as the automation provides a mechanism for confidence in software and hardware updates and patches.

At Griffith University, eResearch Services provide specialist IT support for researchers across all Griffith Schools and Research Centres. Our activities are split between software development for niche research needs, providing services such as high performance computing, hosting and maintaining broadly applicable software applications such as research storage and data collection tools, alongside programming workshops and media productions for research projects.

By implementing a continuous deployment pipeline our goal was to be able to quickly and efficiently deploy new applications, patches to existing applications and be able to quickly scale resources available to the project team. It was our experience that the technical debt required to add an additional developer resource to the project far outweighed any possible benefit this would achieve in a reasonable timeframe. By implementing a Continuous Delivery model, we have been able to successfully add a new developer to a project and have productive input in a matter of hours instead of days.

In our presentation, we will discuss the challenges and successes we faced over the past 18 months as we moved towards a Continuous Delivery model for the way we delivered the development of solutions for niche research needs. To highlight some of the challenges and successes we will look at the past 18 months from 3 distinct vantage points; that of the Project Manager, the Developer and that of the Technical Lead. We will talk about the main drivers for our desire to change and delve into some of the key benefits we have already seen and expect to see in the near future.

This presentation will be of interest to anyone that has an interest in improving internal processes, or those that are involved in the development process; from end users to project managers and developers.


Nick Rossow is the acting manager of the Consultancy and Development eResearch group at Griffith University. Nick has worked at Griffith University for 11 years and in the eResearch team for the last 7 years with substantial contributions to research projects in various sectors of Research such as Health, Criminology, Humanities and Social Sciences.

Recently Nick has taken on the challenges of management of the eResearch Consultancy and Development team being involved in all areas of research projects and governance.

Describe, Manage and Discover Research Software

Jens Klump1, Sue Cook2, Ryan Fraser3, David Lescinsky4, Mingfang Wu5, Lesley Wyborn6

1 CSIRO, Perth, Australia,
2 CSIRO, Perth, Australia,
3 CSIRO, Perth, Australia,
4 Geoscience Australia, Canberra, Australia,
5 ANDS, Melbourne Australia,
6 NCI, Canberra, Australia,



Software plays a critical role in data driven research, where software is developed for cleaning, processing, analysing, and visualising data. Software developed and used as part of the research cycle has been increasingly recognised as an important component for research reproducibility. Thus, it should be treated in the same way as other research inputs and outputs that are part of the record of science such as research data and paper publications. However, there hasn’t been an established a process in scholarly communication for properly managing and publishing software for better discovery, reusability and reproducibility.

Many funding agencies and communities recognize the import role that software plays in the research lifecycle. For example: the Force 11 Software Citation Group has developed and publicised six software citation principles [‎1], namely: importance, credit and attribution, unique identification, persistence, accessibility and specificity (versioning). A new Force11 Software Citation Implementation Working Group [‎2] has formed and commenced on the implementation of the principles. In a recent Software Source Code BoF at the 9th Research Data Alliance Plenary, there were constructive discussions on archival, discoverability and reproducibility of software source code [‎3].  However, more discussion on implementation issues such as both human and machine readable description, systematic unique identification, and licence etc. for software are needed to build a community consensus on these issues.

This 60-minute BoF will provide an overview of activities and challenges managing and describing software, followed by three presentations on current issues, practices and experience in managing and describing software. The presentations will be followed by a group discussion on barriers people are facing in managing and describing software, the outcome from this discussion may be actions for various software interest or working groups, including an Australian software citation IG ‎[4].

The proposed outline is as follows:

  • Introduction to the session
  • Three short presentations by Sue Cook, David Lescinsky, and Ryan Fraser on current issues, practices and experience in managing and describing software for the purpose of discovering and reusing software
  • Group discussion: barriers in managing and describing software
  • Next steps and wrap up with actions


  1. Smith A. M., Katz D. S., Niemeyer K. E., FORCE11 Software Citation Working Group. (2016) Software Citation Principles. PeerJ Computer Science 2:e86. DOI:10.7717/peerj-cs.86.
  2. Force 11 Software Citation Working Group. Available from:, accessed 27 June 2017.
  3. Working document for RDA BoF on a Software Source Code focus group. Available from:, accessed 27 June 2017.
  4. Australian Research Software Interest Group. Available from:, accessed 27 June 2017.


Jens Klump is a geochemist by training and OCE Science Leader Earth Science Informatics in CSIRO Mineral Resources. Jens’ field of research is the application of information technology to geoscience questions. Research topics in this field are numerical methods in minerals exploration, virtual research environments, high performance and cloud computing, and the development of system solutions for geoscience projects. Jens previous work involved building repositories for research data and persistent identifier systems. This project sparked further work on research data infrastructures, including the publication and curation of scientific software and its source code. Follow him on Twitter as @snet_jklump.

Sue Cook is a Data Librarian with the Research Data Support team of CSIRO Information Management and Technology. Formally from a science background before becoming a librarian, she has been with CSIRO since 2006. She has interests in new models of science scholarly communication, data management and using social media for professional development.

Ryan Fraser is a Portfolio Manager with the CSIRO, with over 15 years of experience working in R&D, commercialisation of products and delivery to both government and industry using agile engineering methodologies. Ryan has led many Australian eResearch projects, including the AuScope Grid; Australian Spatial Research Data Commons; VGL; the Virtual Hazards, Impact and Risk Laboratory(VHIRL); ANDS and NeCTAR projects. Ryan possesses specialised knowledge and has current projects in spatial information infrastructures, data analytics, Cloud Computing, Data Management, and Interoperability and has extensive experience in managing and successfully delivering projects.

David Lescinsky is currently the team lead of GA’s High Performance Data / High Performance Computing Science Team and is responsible for facilitating and managing GA’s eResearch projects, including: GA’s science projects at the National Computational Infrastructure (NCI), GA’s national data collections at the Research Data Storage Infrastructure (RDSI), and GA’s Virtual Laboratories Programme. David has a M.Sc. and Ph.D. in Earth Sciences and has more than 20 years of experience working as a geologist.

Mingfang Wu has been a senior business analyst at ANDS since 2011. Mingfang has been working on a range of ANDS programs such as data capture, data applications, data eInfrastructures Connectivity, and trusted research outputs. Mingfang is co-chairing the Research Data Alliance: Data Discovery Paradigms Interest Group, and two Australian Interest Groups: Data Provenance and Software Citation. Mingfang received her PhD from RMIT University in 2002 from School of Computer Science, she was a senior research fellow at RMIT from 2006 – 2011 and a research scientist at CSIRO from 1999 – 2006, all in the area of information retrieval.

Lesley Wyborn is a geochemist by training and worked for BMR/AGSO/GA for 42 years in a variety of geoscience and geoinformatics positions. In 2014 she joined the ANU and currently has a joint adjunct fellowship with National Computational Infrastructure and the Research School of Earth Sciences. She has been involved in many NCRIS funded eResearch projects over the years. She is Deputy Chair of the Australian Academy of Science ‘Data for Science Committee’ and is co-chair of several RDA Interest Groups as well as a member of the AGU Earth and Space Science Executive Committee.

Synch&Share (“ownCloud et al.”) operator’s BoF

Peter Elford, Mr Guido Aben1,Brett Rosolen

1AARNet, Kensington, Australia



Over the past few years, a number of research and education storage operators/providers and institutions in Australia have opted to augment their storage offering by providing a synch&share (“dropbox-like”) interface into their storage, next to the more traditional interfaces of FTP/NFS/CIFS/S3/Web etc. A handful of different packages exist to enable such a synch&share interface, but the predominant one is ownCloud and its recent fork NextCloud.  Indeed, many Australian research and education storage operators have opted to use an ownCloud or NextCloud package to provide cloud storage services with.

The potential existence of synergies between these cloud storage providers therefore warrants an exchange of knowledge and these topics will be tabled as discussion items:

  1. Technical as well as procedural and policy concerns – is harmonization desirable and/or attainable?
  • Joint operation and joint articulation of interests to vendors
  • Interoperability and reduction of silo development; this in light of the integrative goals of the Australian Research Data Cloud; with specific attention to use of the OpenCloudMesh interoperability standard[1]

The session with be initiated with a short presentation of operations at AARNet and recent experience with OpenCloudMesh and then opened to the floor for facilitated and interactive discussion as a 90 minute BoF.


  1. Available from:, accessed 23 June 2017

Recent Comments

    About the conference

    eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

    Conference Managers

    Please contact the team at Conference Design with any questions regarding the conference.

    © 2018 - 2020 Conference Design Pty Ltd