MyTardis: FAIR data management for instrument data

Conveners: Wojtek J. Goscinski1 , Amr Hassan2

Presenters: Andrew Janke3Andrew Mehnert4Aswin Narayanan5Dean Taylor6James M. Wettenhall7Jonathan Knispel8Keith E. Schulze9Lance Wilson10Manish Kumar11Samitha Amarapathy12

1Monash eResearch Centre, Monash University, Melbourne, Wojtek.Goscinski@monash.edu
2Monash eResearch Centre, Monash University, Melbourne, Amr.Hassan@monash.edu
3National Imaging Facility, Center for Advanced Imaging, The University of Queensland, Brisbane, andrew.janke@uq.edu.au
4Centre for Microscopy, Characterisation and Analysis, The University of Western Australia, Perth,andrew.mehnert@uwa.edu.au
5National Imaging Facility, Center for Advanced Imaging, The University of Queensland, Brisbane, a.narayanan@uq.edu.au
6Centre for Microscopy, Characterisation and Analysis, The University of Western Australia, Perth, dean.taylor@uwa.edu.au
7Monash eResearch Centre, Monash University, Melbourne, james.wettenhall@monash.edu
8Centre for Microscopy, Characterisation and Analysis, The University of Western Australia, Perth, jonathan.knispel@uwa.edu.au
9Monash eResearch Centre, Monash University, Melbourne, keith.schulze@monash.edu
10Monash eResearch Centre, Monash University, Melbourne, lance.wilson@monash.edu
11Monash eResearch Centre, Monash University, Melbourne, manish.kumar@monash.edu
12Monash eResearch Centre, Monash University, Melbourne, samitha.amarapathy@monash.edu

GENERAL INFORMATION

  • Workshop Length: One Day
  • This workshop will have the last 2 hours as a hands-on component

DESCRIPTION

Research data management platforms aim meet the challenges of capturing and managing large volumes of research data,  while ensuring that the data is Findable, Accessible, Interoperable and Reusable (FAIR). One such platform is MyTardis (https://www.mytardis.org), an open source research data management platform that was initially establish to handle and store macromolecular crystallography data {Meyer:2014ub, Androulakis:2008ku}. Through several national projects like the NeCTAR Characterisation Virtual Laboratory (https://www.massive.org.au/cvl), ImageTrove (http://projects.ands.org.au/id/ERIC08) and the ANDS Trusted Data projects (https://projects.ands.org.au/id/GFA16), MyTardis has evolved into a general purpose research data management system, with a focus on integrating scientific instruments and instrument facilities. It is used across light microscopy, electron microscopy, proteomics, cytometry, magnetic resonance imaging (MRI), positron emission tomography (PET), and other scientific techniques. It integrates over  100  Australian  instruments  across  Monash  University,  University  of  Queensland,  University  of  Newcastle, University of New South Wales, RMIT, and University of Western Australia.

In this workshop, representatives from the Characterisation community will share their experience in developing and operating large deployments of MyTardis. We will emphasise how MyTardis helps to securely store and manage data from  a  variety  of  different  instruments.  We  will also outline the short- to medium-term roadmap for MyTardis development and our plan to engage the wider community to help us build the next-generation platform for instrument data management. Finally, we will run a hands-on workshop on best-practices for deploying and operating MyTardis, specifically targeted at developers and system administrators.

Workshop Contents:

Overview of MyTardis and its deployments

  • Overview of MyTardis
  • Developing and operating MyTardis at Monash University
  • NIF Trusted Data Repositories
  • Developing and operating MyTardis at the University of Queensland and NIF
  • Developing and operating MyTardis at the University of Western Australia
  • Developing and operating MyTardis at the University of Newcastle
  • MyTardis features for instrument facilities

Future Roadmap

  • The Future of MyTardis
  • Requirements from instrument facilities
  • Addressing FAIR by integrating with the experiment, trusted data repositories.
  • Panel Discussion / BOF- Next-generation Instrument data Future and challenges

Hands On

  • Hands on session on deployment of MyTardis

WHO SHOULD ATTEND

  • Instrument facility managers
  • Data Managers
  • IT Managers & Directors
  • Professionals in associated disciplines
  • Research Computing Specialists
  • Research Managers
  • University Representatives
  • Researchers
  • Librarians
  • Software & App engineers

WHAT TO BRING

Attendees need to bring a laptop.


BIOGRAPHIES

Dr Wojtek James Goscinski is the coordinator of MASSIVE, a national high performance computing facility for data science, and Associate Director at the Monash eResearch Centre a role in which he leads teams to develop and implement digital strategies to nurture and underpin next-generation research. He holds a PhD in Computer Science, a Bachelor of Design (Architecture), and a Bachelor of Computer Science.

Dr Amr Hassan is the eResearch Delivery leader at the Monash eResearch Centre. He leads a team of eResearch professionals to ensure the delivery of high-quality ICT services, projects and programmes that enable the achievement of the eResearch strategic agenda of Monash University. He holds an interdisciplinary PhD in Computational Sciences, an M.Sc in Scientific Computing, and a B.Sc. of Computer Science.

 

Systems Administration in Research Computing

Conveners: Mr Greg Lehmann1, Mr Jake Carroll2

Gin Tan3, Dr Robert Bell4, Michael Mallon6, Linh Vu7, Steve McMahon5

1CSIRO, Pullenvale, Australia, Greg.Lehmann@csiro.au
2The University of Queensland, St. Lucia, Australia, Jake.Carroll@uq.edu.au
3Monash University, Melbourne, Australia, Gin.Tan@monash.edu
4CSIRO, Melbourne, Australia, Robert.Bell@csiro.au
5CSIRO, Canberra, Australia, Steve.McMahon@csiro.au
6The University of Queensland, Brisbane, Australia, Michael.Mallon@uq.edu.au
7The University of Melbourne, Melbourne, Australia, vul@unimelb.edu.au

GENERAL INFORMATION

The workshop will be a full day event, without a hands on component. There are no limits on the number of attendees. There are no special requirements in equipment.

DESCRIPTION

Research Computing uses tools and techniques that are specialized in nature. Systems administrators working with these tools and the scientists who use them have a different skill set to the norm in IT. This workshop will present information in this area and showcase use cases with the aim of knowledge transfer between practitioners.

1. Workshop introduction and site introductions. 5 minutes per site e.g.

a. Pawsey
b. CSIRO
c. NCI
d. DST
e. Monash
f. Swinburne
g. CQU
h. From the floor

2. Space/data management techniques. Flushing, quotas and HSM with encapsulation. Data life cycle, dataset concept. Exclude publication of datasets. – various – Rob Bell, Greg Lehmann, David Rose
45 mins

BREAK

3. BeeGFS Use Cases in Australian HPC – Jake Carroll and Greg Lehmann
(1) Filesystems for accelerated computing – Australia’s first all flash BeeGFS production environment

Through analysis and system observability, it has become evident that accelerated supercomputing has presented a new kind of challenge to filesystems. This presentation discusses the challenges the University of Queensland faced in the process of scaling DL, AI, ML and deconvolution workloads and the pressures these workloads created on traditional parallel filesystems. Arriving eventually with the use of an RDMA all flash BeeGFS implementation, this presentation details the architectural considerations, workloads and corner cases that obviated such an approach.

(2) CSIRO’s new scratch FS – a first look a couple of months in.
30 mins

4. A Year with CephFS for HPC – Linh Vu
This presentation discusses the findings and challenges that the University of Melbourne experienced within a year of implementing CephFS as the storage solution for our growing HPC service. I will talk about our journey from a small POC 6-node 768TB (raw) NLSAS cluster to over 10 times the size, with a mix of NLSAS, SAS SSD and NVME SSD storage pools to cater for different workloads. I will address the design, technical and managerial challenges we have had to face to bring a relatively unknown filesystem to HPC, which we are now heavily investing in.
30 mins

5. Efficiently sharing data between HPC and cloud computing platforms – Michael Mallon
One of the guiding principles of the Medici project is to make where data lives somewhat independent from how a researcher might want to consume data. Adhering to this principle enables researchers to choose the most appropriate tool for a particular part of a workflow without incurring a mirroring or replication overhead. One of the more difficult places to adhere to this principle is the intersection cloud computing and HPC resources in workflows. I’ll talk about how we’ve addressed this using GPFS’s unified object and file interface and swifthlm.
30 mins

LUNCH

6. Ansible for Cluster Build – Gin Tan
The new M3 cluster is a bit different to a traditional HPC cluster. The cluster sits on the Monash research cloud and instances are provisioned with ansible – we called it cluster-in-a-box. The idea is to be able to provision a cluster anytime and anywhere we want.
30 mins

7. OpenHPC Experiences on the UQ Wiener cluster – Jake Carroll
30 mins

8. Using Bright Cluster Manager to streamline and improve HPC operations – Steve McMahon
Managing HPC systems can be complex.  There’s a lot happening and a lot of things to check to make sure they are working correctly.  This talk is about how using a product like Bright Cluster Manager can simplify HPC operations, check for common problems and improve service levels.
30 mins

BREAK

8. Slurm on Ozstar at Swinburne – Chris Samuel
This short talk will cover how we use Slurm on Swinburne’s OzStar GPU cluster. It will cover what plugins we use, and why, as well as how we try and balance the various competing requirements for scheduling our workload through fair-share, partition configurations and our Lua job submit plugin. If time permits it will also cover as yet unsolved problems we wish to address.
30 mins

9. Scheduling containers in the cloud and hpc – Gin Tan
How we use the same container to run jobs in both Kubernetes and Slurm. The idea is to take HPC workload bursting into the cloud and looking for suggestions from the crowd as well if there’s any. The workload will be as simple as using Tensorflow in the container.
30 mins

10. HPC procurement panel discussion – various speakers including Jake Carroll
30 mins

WHO SHOULD ATTEND

IT workers who maintain the underlying Computing and Data Infrastructure used by scientists to do eResearch.

WHAT TO BRING

No special equipment required. Some background in IT required, preferably in HPC/Cloud computing.

 


BIOGRAPHIES

Greg Lehmann has 35 years IT experience. Greg worked at the University of Queensland in his early career and has had varied mini careers in CSIRO. At present he works in the data team focused on filesystem delivery for HPC and cloud. Greg still has a strong interest in HPC systems in general which was his previous role. He is also the Infiniband fabric tech lead for CSIRO.

Jake Carroll is currently the Associate Director of Research Computing for UQ’s three large scientifically intensive research institutes – the Australian Institute for Bioengineering and Nanotechnology, the Institute for Molecular Bioscience and the Queensland Brain Institute.

Jake has spent the last 12 years in scientific computing, working on everything from building supercomputers to managing the strategy and complexity that comes with scientific endeavour.

Jake spends his time working to make scientific computing platforms, technology and infrastructure as good as it can be, such that world class research can be conducted, unencumbered.

 

Collaborative Research in Practice – a roadmap to using Git and Docker

Convener: Rebecca Lange
Presenters: Dr Rebecca Lange1, Mark Gray2, Brian Skjerven2

1Curtin Institute for Computation, Curtin University, Perth, Australia, rebecca.lange@curtin.edu.au
2Pawsey Supercomputing Center, Perth, Australia, mark.gray@pawsey.org.au
3Pawsey Supercomputing Center, Perth, Australia, brian.skjerven@pawsey.org.au

GENERAL INFORMATION

  • Workshop Length: 1 day
  • Hands-on component: Yes, we are aiming for half of the workshop being hands-on
  • Attendees: less than 40

DESCRIPTION

The Internet has made it easy for researchers to collaborate across universities using tools such as emails, instant messaging and videoconferencing. However, these tools do not encourage transparency and reproducibility of research. For example, what would happen if you forget to put someone on your email chain and how do you reconcile different code snippets that various team members have worked on?

In this workshop we will first have a look at the modern scientific landscape in which code plays an increasingly important role in obtaining and reproducing research results.

We then move on to a hands-on introduction to version control (with Git and GitHub) and “containerising” with Docker. Finally, we will conclude the workshop with a session on using version control and Docker in practice for a collaborative and migrateable workflow.

1. Welcome, introductions and workshop outline.

The Importance of software in research and academia.  30 minutes

2. Introduction to version control with Git and Github – Part I. 60 minutes

  • What is version control
  • Setting up git
  • Creating your first repository

3. Break – 30 minutes

4. Introduction to version control with Git and Github – Part II. 90 minutes

  • Tracking changes
  • Exploring the repo history
  • Remote repositories on Github

5. Lunch Break – 60 minutes

6. Introduction to containers using Docker and Dockerhub. 60 minutes

  • What is Docker
  • Creating a Docker container
  • Running a Docker container
  • Containers on Dockerhub

7. Break – 30 minutes

8. A collaborative (software/coding) workflow in practice. 120 minutes

  • Working collaboratively on Github
  • Managing containers and dockerfiles with Git
  • Managing your workflows using containers
  • Where to next? Pointers to further reading and learning

WHO SHOULD ATTEND

This workshop is aimed at academics/researchers who regularly write and/or work with software, and research support staff who would like to know more abut useful tools for code and workflow management. We will cover:

  • how to track changes,
  • how to make sharing of code and software easier, and
  • how to manage issue tracking and workflows in a collaborative software development environment.

WHAT TO BRING

Attendees will need to bring their own laptop with some software pre-installed.

It would be beneficial if participants have used a bash terminal (or similar) before.

We will provide information on what to install and how to install closer to the time.


BIOGRAPHY

Rebecca Lange received her PhD in astronomy from the International Centre for Radio Astronomy Research at the University of Western Australia.

Before Rebecca moved to Australia she studied Astronomy and Physics at Nottingham Trent University where she also worked as a research assistant in scientific imaging for art conservation and archaeology. Her work there included the development and testing of instruments and software for imaging and spectroscopy as well as the organisation and supervision of field trips, which often required liaising with art curators and conservators.

Throughout her studies and research Rebecca has gained extensive programming as well as data analytics and visualisation experience in various programming languages.

Currently she is working as a data scientist for the Curtin Institute for Computation where she helps researchers by providing data analytics and computational support and training.

 

12

Recent Comments

    About the conference

    eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

    Conference Managers

    Please contact the team at Conference Design with any questions regarding the conference.

    © 2018 - 2019 Conference Design Pty Ltd