Ms Aniek Roelofs1,2, Dr James Diprose2, Dr Richard Hosking1,2, Dr Rebecca Handcock1,2, Professor Cameron Neylon2, Associate Professor Lucy Montgomery2, Dr Alkim Ozaygen2, Dr Katie Wilson2, Dr Chun-Kai (Karl) Huang2
1Curtin Institute for Computation (CIC), Perth, Australia
2Curtin Open Knowledge Initiative (COKI), Perth, Australia
The Observatory Platform, developed by the Curtin Open Knowledge Initiative (COKI), is a FAIR principles environment for fetching, processing and analysing data to understand how well universities operate as Open Knowledge Institutions. Around 15 different data sources are currently used and this number is growing. To manage these data sources, many different computational resources are required that need to be maintained in a clear and repeatable way. This is especially important as external researchers should be able to easily deploy their own Observatory Platform and collect data as desired.
Both workflow- and infrastructure-management systems are desired to address this. The different datasets vary immensely in size and complexity, therefore the required storage space and computational power also varies. To prevent wastage of any resources these two systems should interact with each other and adjust accordingly.
Terraform is used to deploy our system on the Google Cloud Platform. The Virtual Machine (VM) created by Terraform contains Docker containers that host Airflow, the workflow management system. When a more complex workflow is scheduled, Airflow will run an additional workflow that calls the Terraform API to create another, larger, VM that runs the complex workflows and shuts down automatically when they are finished.
We created an environment that can be easily be set-up remotely while being managed within a team. Both Terraform and Airflow are open-source tools, ensuring that the Observatory Platform stays FAIR. Cost savings are achieved by making use of computational resources as efficiently as possible.
Aniek Roelofs is a developer at Curtin University where she is a part of the COKI team. She obtained a Master of Science in Bioinformatics from the University of Amsterdam and has experience setting up workflows and processing big data.
Previously working with sequencing data, she’s now focused on bibliometric data to assist the Curtin Open Knowledge Initiative with their research on how well universities operate as Open Knowledge Institutions.