eResearch through Co-Design Implementing ISO 9001 at Monash eResearch

Dr Steve Quenette1, Wojtek James Goscinski2, Komathy Padmanabhan3, Paul Bonnington4

1Monash University, Clayton, Australia,

2Monash University, Clayton, Australia,

3Monash University, Clayton, Australia,

4Monash University, Clayton, Australia,


Organisations around the world implement quality management systems to ensure that their services, operations, products, and processes meet a robust, consistent and reliable level of quality. These systems can be certified to provide confidence to the organisation’s stakeholders, users and customers. The ISO 9001:2015 Quality Management Systems is one of the most recognised and Monash has undertaken to certify all technology research platforms under an ISO 9001 quality accreditation.

Since 2008, Monash’s eResearch programme has enabled and accelerated research endeavours through the application of advanced computing, data informatics, tools and infrastructure, delivered at scale, and built with a “co-design” principle, where researchers and technologists work in mutual partnership to design, build and operate advanced computational infrastructure. In 2017, Monash University eResearch Centre undertook ISO 9001 quality accreditation. The core of this implementation is to retain and strengthen strong researcher-technologies co-design principles.

This presentation will cover major attributes and considerations in accrediting an eResearch centre through a formal quality framework, including:

  • Researcher-technologies codesign as fundamental principle;
  • A structured set of management axes to accommodate the varying scale, structure and scope of eResearch projects;
  • Risk based thinking;
  • Continual improvement;
  • Future initiatives identified as an outcome of quality accreditation.

Co-design as core principle

The co-design principle recognises and values the full participation of both technologists and researchers in every stage of design, build and operation of advanced research technology platforms and infrastructure. In essence, the most appropriate discovery environments arise when technologists/service-providers and research communities take joint-responsibility for governance, technical decision making, project management, implementation and operations. Co-ownership and joint-responsibility is supported by strong engagement through proactive communication with regular feedback on impact. Technical decision making is largely evidence-based, and there is a strong emphasis on leveraging of existing capabilities and expertise to build the future capabilities. All of this is backed up with solid training and outreach.

Management axes

To accommodate the varying types of eResearch projects, Monash eResearch implemented a structure along four axes: Leadership & strategy, Projects & Contracts, Operations & Infrastructure and People & Expertise. All the axes were evaluated and certified for the ISO 9001: 2015 principles.

The Leadership & Strategy axis is responsible for ensuring that the centre’s priorities are aligned to the overarching research priorities of the University. The Projects and Contracts axis encapsulates national, international and local collaborative projects, collaborations with other Technology Research Platforms within University and other internal agile research software development projects. The Operations and Infrastructure axis is responsible for the day-to-day operations of the accessible infrastructure and software platforms of the centre. The People and Expertise axis focuses on retaining national and international leadership in eResearch space by developing and nurturing staff talent.

Figure 1: Monash eResearch quality management axes

Risk based thinking

Unlike the earlier versions of ISO 9001, the ISO 9001:2015 version promotes Risk based approach to management and the implementation process has culturally embedded risk based thinking among the staff, encouraging everyone to identify and managing risks at all level as a continuous process. As a part of implementation process, comprehensive risk re-evaluation has been conducted across all the management axes,  to identify risks, categorise and prioritise them based on the impact and likelihood. Processes and plans have been put in place to ensure the risks are reviewed, mitigated and escalated as appropriate. Critical risks are addressed at the management and leadership reviews and have resulted in revisions of strategic priorities and investment.

Continual Improvement

The implementation process has enabled structured thinking & framework towards continual improvements to standard operating procedures, communication plan & strategy, regular researcher feedback, staff engagement and development, stakeholder reviews and asset management. Policies and procedures were reviewed to ensure they are aligned to the objectives and values of the centre.

Regular review mechanisms have been embedded into the day to day functions of the centre and internal cross function audits are scheduled.

Future Initiatives

As an outcome of quality accreditation, Monash eResearch is undertaking a program of work to improve internal and external communications and researcher relationship management. Planning and early outcomes of this program will be presented.


Dr Steve Quenette, Deputy Director of the Monash eResearch Centre. This multi-disciplinary centre now includes over 40 eResearch and IT professionals providing expertise, computing, visualisation and data capabilities into numerous research areas such as Cryo-electron microscopy, Macromolecular Crystallography, Neuroscience, Archaeology, Proteomics, Genomics, Structural Biology, Bio-medical Imaging, Climate Modeling, Computational Chemistry, Materials Engineering, Fluid Dynamics. Since 2010, the centre has been selected to host over $20M of Australia’s federally-funded national eResearch infrastructure for specialised high-performance computing, research cloud services and data storage and data management, underpinning the research of over 4,000 researchers.  The work of the centre, particularly in real-time data processing on software-defined infrastructure, is internationally regarded and many of the innovations of the centre have been adopted in related European projects. The centre is also a global Centre of Excellence or a strategic technology partner for NVIDIA, Mellanox, Dell, and Redhat.

Scientific workflow uptake – What are the challenges?

Siddeswara Guru1, Minh Dinh1, David Abramson1, Gareth Price 1, Damien Watkins2, Lachlan Hetherton2, Alan Both3

1University of Queensland, Brisbane, Australia,,,,

2 Data61 CSIRO, Melbourne, Australia,,

3RMIT University, Melbourne, Australia,  


A scientific workflow is a series of well-defined coordinated, structured activities that define a particular investigation or experiment process in scientific context [1]. Workflow in science is useful because it enables scientists to:

  • describe, manage, share and execute scientific analyses;
  • provide a high-level abstract view of scientific computation, while hiding underlying details;
  • interface with distributed computing environment;
  • capture complete workflow as an artefact and make it a reusable entity [2];
  • capture provenance information for further analysis and knowledge re-use.

In an eResearch 2017 BoF session, we provided an overview presentation of some of the scientific workflow management systems (SWMS) (e.g., Kepler, Galaxy, Workspace) used in different science disciplines. Notably, an interactive Q&A panel discussed the motivations and the use cases of scientific workflows, how to choose the right tool for particular application, and developed a community around workflow management system.

While some SWMSs have proven their success in improving the rate of scientific discovery, overall uptake of scientific workflows for eResearch is still limited. In this year BoF, we address the challenges in the uptake of these SWMSs from the perspectives of domain scientists, eResearch analysts, workflow engine developers and decision makers. Especially, we will engage technical issues in the following areas:

  • developing workflows and subsequent tools;
  • debugging individual workflow components and the workflow as a whole;
  • leveraging cloud resources and capabilities;
  • scheduling workflows jobs in cloud;
  • provenance tracking and propagation;
  • platforms to use and run workflows;
  • reproducibility challenges;
  • deploying and sharing workflows.

BoF Details:

  • Short presentations from domain scientists and eResearch analysts on their experience in developing and using workflow management systems including Kepler, Galaxy, KNime, and Workspace. An open discussion on challenges in operationalising some of the complex processes using workflows and lessons from different tools. The BoF will conclude with a concrete plan to improve the practice in the scientific workflow for knowledge sharing and capacity building.
  • The BoF session will run for 60 minutes. The first 20 minutes is allocated for an introduction to the BoF and short presentations, next 30 minutes for panel discussion to discuss the challenges of uptake and 10 minutes to discuss future coordination and planning.


  1. Talia, D. Workflows Systems for Science: Concepts and Tools. ISRN Software Engineering, 2013.
  2. Guru, S.M., I.C. Hanigan, H.A. Nguyen, E. Burns, J. Stein, W. Blanchard, D. B. Lindenmayer, and T. Clancy, Development of a cloud-based platform for reproducible science: the case study of IUCN Red List of Ecosystems Assessment. Ecological Informatics, 2016.


Siddeswara Guru is a program lead for the TERN data services capability. He has experience in the development of domain-specific research e-infrastructure capabilities.

The Australian Research Data Commons – Building on the foundations of ANDS, Nectar and RDS to become a transformational investment

Convener: Ian Duncan1

1Acting Executive Director, ARDC, QLD,



A key recommendation in the 2016 National Research Infrastructure Roadmap1 was to bring together ANDS, Nectar and RDS projects into a single entity.  This was achieved in June 2018 with the establishment of the Australian Research Data Commons (ARDC).

As part of the establishment of the ARDC, a strategic plan has been developed which sets out the strategic intent for the next 5 years.  This plan has been developed in consultation with key partners and the community, with the intent of building on the strengths of ANDS, Nectar and RDS as a transformational investment, partnering to facilitate a coherent research environment that will enable Australia’s researchers to find, access, contribute to and effectively use leading data-intensive eResearch infrastructure to maximise research quality and impact by developing a world-leading data advantage, facilitating accelerated innovation, fostering collaboration for borderless research, and enhancing researchers’ ability to translate their research into benefits for society.

This BoF will set out the roadmap for the next 5 years and provide an opportunity for the community to engage with the ARDC as it embarks on this journey as well as examine the opportunities for the ARDC to partner with NCRIS facilities, institutions and research communities.

Proposed Format

The session will start with a short overview presentation to set the scene following by a series of round tables focused around the ARDC strategic pillars, concluding with a panel discussion and summary.


  • Commonwealth Department of Education and Training, 2016 National Research Infrastructure Roadmap, 2016


Ian is the Acting Executive Director of the Australian Research Data Commons (ARDC).  He has held roles including Director of the RDS NCRIS project, has led programs within ANDS, and has been Associate Director of Enterprise Support and Associate Director of Infrastructure and Operations at the University of Queensland, as well as founding, running, and selling his own Internet Payment Gateway company and ISP, and working for the Shell Oil Company and National Australia Bank. He has a degree in Economics and Politics, is married to a Professor researching Alzheimer’s Disease, has two fantastic teenage kids, and sees opportunities for collaboration, reinforcement, support, and excellence throughout the research sector and feels immensely positive and optimistic about the impact the ARDC, together with our partners, can bring about.

Establishing a place-based data lab at Griffith University

Dr Tom Nik Verhelst1, Malcolm Wolski2, Linda O’Brien3

1Griffith University, Regional Innovation Data Lab, Meadowbrook, Australia,

2Griffith University, Brisbane,

3Griffith University, Logan,


THE REGIONAL Innovation Data LAB

Data is being collected at an increased pace with 90 percent of all current data created within the last 2 years. New technologies in the IoT space are capable of creating an enormous amount of machine generated data. All this data holds a promise, the promise of a better world, where through the use of machine learning and artificial intelligence insights can be created for the betterment of humankind. Yet most of this data is housed in data islands which are unconnected. There are great analytics tools for structured tabular data but analytics tools for exploring non-tabular data, such as messaging data, pictures, video and a whole series of industry specific data are still emergent. The challenges of data volume, variety, velocity and veracity risks are drowning us in data while we are starving for insight.

The Regional Innovation Data Lab (RIDL) is an initiative based at Griffith University Logan Campus, a campus which blends Griffith’s research, teaching and engagement strengths into meaningful impact and influence within a specific community as a true example of a Civic University. Through deep engagement with government, business and the broader community, we put our academic knowledge, creativity and expertise to work, to develop innovations and solutions that make a positive difference within our community.

RIDL was created to facilitate insight through data for policy makers, researchers, NGO’s and individuals and to inspire the next generation of data entrepreneurs. It provides easy access to a series of trusted data sources. We use multiple linked data sources to help policy makers make informed decisions and ideally prevent problems in our cities and regions before they happen.








Figure 1: RIDL Vision

How and Who?

Together with AURIN (Australian Urban Research Infrastructure Network) and QCIF (Queensland Cyber Infrastructure Foundation) we have created a rich holistic data set with an initial primary focus on Logan City and direct surroundings. Aggregated data is collected and integrated by our spatial data experts. Our team of web developers creates easy to use dashboards that are topic specific. These apps allow easy analysis of changes in socio- economic parameters of interest. By releasing apps with easy to use, limited functionality we increase the appetite and expertise of data analysis and data exploration within our application community and gradually build more complexity into our solutions.

We provide direct data access via data science tools like R and Python by using R Studio and Jupyter for our researchers that contribute to answer specific socio-economic questions or require data for their research projects.

Figure 2: RIDL Sweet Spot

Use cases

Logan Together is a 10 year collective impact project that is changing the lives of kids and families in Logan for the better. RIDL provides accurate data to track the Logan Together community initiatives, by using a holistic place-based data sets we can identify and quantify secondary and tertiary effects of the Logan Together initiatives.

In collaboration with Griffith’s Policy Innovation Hub and our academic groups we use the lessons learned from Logan Together and the comprehensive data sets we have collected to provide insight into other communities within Queensland. By leveraging multi parameters data sets that range from social service data, health data, financial data to education data (not an exhaustive list) we provide unprecedented insight into communities and help cities, community centers and NGO’s direct their energy and efforts more effectively.

Figure 3: RIDL app example

Through the Policy Innovation Hub we are working together with local, state and federal government agencies to enable evidence-based policy making. By combining a large as possible relevant data set we can track changes in socio-economic parameters that are effected by new policies. Together with our researchers and data scientists we apply algorithms to simulate and predict policy outcomes and provide local, state and federal policy makers with insights into optimal policy implementation.



Dr Tom Verhelst is the Program Director, Regional Innovation Data Lab at Griffith University, Logan Campus. Tom comes to from Belgium with a strong university and industry R&D experience in applying novel platform technologies towards scientific advancement to generate beneficial socioeconomic impacts. Tom has experience in the fields of digital platform technologies, wearables, data science, virtual & augmented reality.

Tom is driven to make a positive change in the world and hopes to achieve this by bringing people and technology together for the better. As Program Director of the Regional Innovation Data Lab he hopes to improve the lives of people in Logan, Queensland and Australia. Our place based initiative will focus on using different trusted data sources to create insight into real world challenges and inform policy development. We want to detect and prevent problems in our cities and regions before they happen.

Reintegrating dispersed collections of ancient Cypriot glass and faience in virtual spaces

Ms Kellie Youngs1

1School of Historical and Philosophical Studies, University of Melbourne, Melbourne, Australia,



This presentation is about a pilot project that has developed as an ancillary activity of my core PhD project on: The Transmission and Innovation of Faience and Glass Technologies of Cyprus in the Late Bronze Age. While examining widely dispersed collections of Cypriot glass and faience objects in London, Stockholm, and on Cyprus to compare composition and morphology and collate the various locations and dating data of objects, it occurred to me that the eResearch tools available to me could provide much more than just the repository for a novel searchable database of 3D scanned objects and their physical characteristics.  The digital imagery and the meta-data could also be considered as an excellent pilot dataset for both scientific analysis in a broader, comparative context and as a way of bringing together objects from tightly-held collections from across the globe that have not been assembled in one place since they came out of the ground over a period of more than a hundred years, and as is more often the case, since they went into the ground several thousand years ago.


Society on Cyprus during the Protohistoric Bronze Age (1750/1700-1100/1050 BCE) went through significant and rapid changes, many of which are imperfectly understood.  Building on an agro-pastoral economic base, Cypriots extended their society into a more industrial, town-centered way of life that was more stratified and international in outlook (Webb 2005).  Scholars emphasise the development of the copper industry as the major contributing factor to the accelerated growth of the Cypriot economy, as it was ushered into the prominent and extensive system of international trade in the eastern Mediterranean (Knapp 2013, 416).  However, many interrelated questions of identity remain, particularly regarding the formation of social, political, and economic entities, as well as migration, integration, materiality, and connectivity.  To illuminate these processes of change, I am surveying the import, manufacture, and use of two luxury materials, glass, and faience, in Cyprus during the Late Bronze Age to provide a material context for the examination of power relations.

To contribute to these larger points of discussion, my research project commenced with an examination of the production and dissemination of glass and faience necessitating travel to Cyprus, Sweden, and Britain to examine and scan objects in museum collections.  Many of the objects in these collections were found during excavations in Cyprus undertaken eighty to one hundred and twenty years ago (Murray, Smith and Walters 1900, SCE 1934), when ‘division of finds’ practices saw many objects removed directly from the excavated site to overseas museums at the completion of each season.  The international dispersal of these artifacts causes difficulties for researchers including access to objects, understanding their lost context, and undertaking comparative analysis. Moreover, the manner of their dispersal resulted in many objects never being seen by a Cypriot person other than the local diggers who liberated them from the ground, or those with sufficient fortune and knowledge to travel overseas and find them in foreign collections, disconnecting this community from their own cultural heritage.


To create a curated collection of artifact scans and provide access for both researcher and visitor to a future virtual museum, this methodology can be divided into two main tasks: (1) scanning and recording objects and associated metadata and provenance, and (2) selecting and visualizing one or more objects at once.


I created a collection of 3D images of key objects using a mid-range, consumer-grade 3D scanner that provides an object mesh and material texture layer.  Using the Matter and Form ™ scanner for my data acquisition had the benefits of being fast, inexpensive, and portable. It also posed some challenges and limitations regarding object size, shapes, and supports suitable for scanning, as well as a challenge to the perceptions of museum curators and collections staff before and after seeing scanner in action.


In consultation with the team at Monash Immersive Visualization Platform (MIVP), two solutions for visualizing my data, PreVis and EnCube, were identified to create a curated collection of artifact scans and provide access for both researcher and visitor to a future virtual museum. Utilizing PreVis, an eResearch workflow tool developed as an aggregation of other similar tools, I can prepare my captured data, load it for pre-visualization and then analysis, in a single workflow ready for visualization in CAVE2, Head-mounted VR, or on the Desktop. To examine objects individually, tag numbers are loaded on the CAVE iPad for viewing in LavaVU.  To view objects simultaneously in the CAVE, a batch of up to 80 objects can be loaded in EnCube.

A reintegrated collection of Cypriot glass and faience objects shown simultaneously in EnCube makes possible examination of a variety of characteristics of construction through activities such as altering the orientation of all objects to compare the types of bases applied.  It is also possible to refine our understanding of find contexts by looking at subsets of objects such as all objects from the site of Enkomi, or all the objects found in graves compared to those found in temples.

Future Work: My long-term goal is to make visualizations of this data available to the international research community and the public for community engagement projects.  An initial step has been to arrange a simultaneous VR teleconference with archaeologists and cultural heritage professionals at the Cyprus Institute – Science and Technology in Archaeology Research Center (STARC) in Nicosia.  This will be the first time Cypriot scholars will see these Cypriot objects all together, hopefully leading to ongoing international, cross-institutional, and interdisciplinary opportunities to collaboratively create and analyze a wider data set of Cypriot archaeological objects and to generate visualizations that convey more comprehensive perspectives of the material culture of the Eastern Mediterranean in the Late Bronze Age.


Gjerstad, Einar. 1934. The Swedish Cyprus Expedition : Finds and Results of the Excavations in Cyprus, 1927-1931, (SCE), Stockholm, Swedish Cyprus Expedition, 1934-1972.

Knapp, A Bernard. 2013. The Archaeology of Cyprus : From Earliest Prehistory through the Bronze Age, Cambridge, Cambridge University Press.

Webb, Jennifer. 2005. Ideology, Iconography and Identity: The role of foreign goods and images in the establishment of social hierarchy in Late Bronze Age Cyprus, in J. Clarke ed. Archaeological perspectives on the transmission and transformation of culture in the Eastern Mediterranean, Oxford, Oxbow Books, 176-182.

Murray, Alexander Stuart, Arthur Hamilton Smith, and Henry Beauchamp Walters. 1900. Excavations in Cyprus:(bequest of Miss ET Turner to the British Museum), London, Trustees of the British Museum


Archaeologist and graduate researcher at the University of Melbourne, School of Historical and Philosophical Studies.

Research interests: Technological innovation and logistics in the ancient Eastern Mediterranean, the relationships between people and landscape, and the archaeology of conflict and commemoration.

Methodologies include 3D object imaging, spatial analyses, and the application of Geographic Information Systems (GIS) to model logistical links between urban environs and landscape, and address archaeological questions.

Fieldwork undertaken in Australia and Cyprus.

Supervisors: Associate Professor Louise Hitchcock and Dr Andrew Jamieson

Next Generation eResearch Leaders Roundtable



  • The intent of the workshop is to provide a series of sessions which are highly interactive, with practical exercises which demonstrate in a practical sense the key learning outcomes.  The style of sessions also allows for the knowledge of the participants to be shared as well as an opportunity for the participants to have access to
  • maximum of 36 participants.


The Next Generation eResearch Leaders Roundtable is a small-cohort workshop for new managers or leaders working in the eResearch system and individuals actively seeking their first management/leadership role. The roundtable provides over six hours of sessions, designed to broaden perspectives and develop leadership abilities, enabling attendees to assume greater leadership roles within their organisations.  The goals of the workshop are to enhance grounding in the overall context in which leadership takes place in the eResearch sector, develop an understanding of the style and context in which decisions are made, and enhance awareness of the need for strong communication, partnership building, and organisational skills.

You’ll have time for reflection, synthesis, and informal networking during the eResearch Australiasia Conference.

You will have the opportunity to develop relationships with a cohort you can turn to for advice and guidance as you progress in the CIO and/or CISO role.



A 9.00am start and 5.00pm finish

1 hour for lunch

  1. Leadership in the eResearch Landscape.

60 minutes

  1. Emotional Intelligence.

45 minutes

  1. Influencing with Stories.

60 minutes

  1. Lunch

60 minutes

  1. 20/20 insight session.

60 minutes

  1. Spheres of Influence and Partnerships.

45 minutes

  1. Organisational Decision Making.

60 minutes

  1. Leadership Roundtable.

60 minutes


The workshop is aimed at eResearch professionals who are in a management/leadership role or who aspire to be the next generation of managers and leaders in eResearch.


No special requirements for delegates.


Systems Administration in Research Computing

Conveners: Mr Greg Lehmann1, Mr Jake Carroll2

Gin Tan3, Dr Robert Bell4, Michael Mallon6, Linh Vu7, Steve McMahon5

1CSIRO, Pullenvale, Australia,
2The University of Queensland, St. Lucia, Australia,
3Monash University, Melbourne, Australia,
4CSIRO, Melbourne, Australia,
5CSIRO, Canberra, Australia,
6The University of Queensland, Brisbane, Australia,
7The University of Melbourne, Melbourne, Australia,


The workshop will be a full day event, without a hands on component. There are no limits on the number of attendees. There are no special requirements in equipment.


Research Computing uses tools and techniques that are specialized in nature. Systems administrators working with these tools and the scientists who use them have a different skill set to the norm in IT. This workshop will present information in this area and showcase use cases with the aim of knowledge transfer between practitioners.

1. Workshop introduction and site introductions. 5 minutes per site e.g.

a. Pawsey
c. NCI
d. DST
e. Monash
f. Swinburne
g. CQU
h. From the floor

2. Space/data management techniques. Flushing, quotas and HSM with encapsulation. Data life cycle, dataset concept. Exclude publication of datasets. – various – Rob Bell, Greg Lehmann, David Rose
45 mins


3. BeeGFS Use Cases in Australian HPC – Jake Carroll and Greg Lehmann
(1) Filesystems for accelerated computing – Australia’s first all flash BeeGFS production environment

Through analysis and system observability, it has become evident that accelerated supercomputing has presented a new kind of challenge to filesystems. This presentation discusses the challenges the University of Queensland faced in the process of scaling DL, AI, ML and deconvolution workloads and the pressures these workloads created on traditional parallel filesystems. Arriving eventually with the use of an RDMA all flash BeeGFS implementation, this presentation details the architectural considerations, workloads and corner cases that obviated such an approach.

(2) CSIRO’s new scratch FS – a first look a couple of months in.
30 mins

4. A Year with CephFS for HPC – Linh Vu
This presentation discusses the findings and challenges that the University of Melbourne experienced within a year of implementing CephFS as the storage solution for our growing HPC service. I will talk about our journey from a small POC 6-node 768TB (raw) NLSAS cluster to over 10 times the size, with a mix of NLSAS, SAS SSD and NVME SSD storage pools to cater for different workloads. I will address the design, technical and managerial challenges we have had to face to bring a relatively unknown filesystem to HPC, which we are now heavily investing in.
30 mins

5. Efficiently sharing data between HPC and cloud computing platforms – Michael Mallon
One of the guiding principles of the Medici project is to make where data lives somewhat independent from how a researcher might want to consume data. Adhering to this principle enables researchers to choose the most appropriate tool for a particular part of a workflow without incurring a mirroring or replication overhead. One of the more difficult places to adhere to this principle is the intersection cloud computing and HPC resources in workflows. I’ll talk about how we’ve addressed this using GPFS’s unified object and file interface and swifthlm.
30 mins


6. Ansible for Cluster Build – Gin Tan
The new M3 cluster is a bit different to a traditional HPC cluster. The cluster sits on the Monash research cloud and instances are provisioned with ansible – we called it cluster-in-a-box. The idea is to be able to provision a cluster anytime and anywhere we want.
30 mins

7. OpenHPC Experiences on the UQ Wiener cluster – Jake Carroll
30 mins

8. Using Bright Cluster Manager to streamline and improve HPC operations – Steve McMahon
Managing HPC systems can be complex.  There’s a lot happening and a lot of things to check to make sure they are working correctly.  This talk is about how using a product like Bright Cluster Manager can simplify HPC operations, check for common problems and improve service levels.
30 mins


8. Slurm on Ozstar at Swinburne – Chris Samuel
This short talk will cover how we use Slurm on Swinburne’s OzStar GPU cluster. It will cover what plugins we use, and why, as well as how we try and balance the various competing requirements for scheduling our workload through fair-share, partition configurations and our Lua job submit plugin. If time permits it will also cover as yet unsolved problems we wish to address.
30 mins

9. Scheduling containers in the cloud and hpc – Gin Tan
How we use the same container to run jobs in both Kubernetes and Slurm. The idea is to take HPC workload bursting into the cloud and looking for suggestions from the crowd as well if there’s any. The workload will be as simple as using Tensorflow in the container.
30 mins

10. HPC procurement panel discussion – various speakers including Jake Carroll
30 mins


IT workers who maintain the underlying Computing and Data Infrastructure used by scientists to do eResearch.


No special equipment required. Some background in IT required, preferably in HPC/Cloud computing.



Greg Lehmann has 35 years IT experience. Greg worked at the University of Queensland in his early career and has had varied mini careers in CSIRO. At present he works in the data team focused on filesystem delivery for HPC and cloud. Greg still has a strong interest in HPC systems in general which was his previous role. He is also the Infiniband fabric tech lead for CSIRO.

Jake Carroll is currently the Associate Director of Research Computing for UQ’s three large scientifically intensive research institutes – the Australian Institute for Bioengineering and Nanotechnology, the Institute for Molecular Bioscience and the Queensland Brain Institute.

Jake has spent the last 12 years in scientific computing, working on everything from building supercomputers to managing the strategy and complexity that comes with scientific endeavour.

Jake spends his time working to make scientific computing platforms, technology and infrastructure as good as it can be, such that world class research can be conducted, unencumbered.


Prediction of Drug Target Interaction Using Association Rule Mining (ARM)

Dr Nurul Hashimah  Ahamed Hassain Malim1, Mr Muhammad Jaziem  Mohamed Javed1

1Universiti Sains Malaysia, USMPenang, Malaysia,



Drug repositioning helps to identify new drug indications (i.e. new known disease) for known drugs [1]. It is an innovation stream of pharmaceutical development that offers an edge for both drug developers as well as for patients since the medicines is safe to use. This method is believed as a successful alternative method in the drug discovery process due to several drugs in the past have been successfully repositioned to a new indication, with the most prominent of them being Viagra and Thalidomide, which in turn has brought a higher revenue [2]. The main reason that made drug repositioning possible is the accepted concept of ‘polypharmacology’ [3]. In general, polypharmacology transformed the idea of drug development from “one drug one target” to “one drug multiple target” [4]. Involvement of polypharmacological in the drug discovery area can be seen when (a) single drug acting on multiple targets of a unique disease pathway, or (b) single drug acting on multiple targets in regards to multiple disease pathways and the polypharmacological property within a drug helps us to identify more than one target that it can act on and hence new uses of the respective drug can be discovered [4]. The use of in silico methods in order to predict the interactions between drugs and target proteins provides a crucial leap  for drug repositioning, as it can  remarkably reduce  wet-laboratory  work and lower the cost of the experimental discovery of new drug-target interactions (DTIs) [5].


Similarity Searching technique which falls under ligand-based category can be classify as one of the well- established method since it was used by many researchers in predicting DTIs [6]. Driving the introduction of these new application is the desire to find patentable, more suitable, lead compounds as well as reducing the high failure rates of compounds in the drug discovery and development pipeline [7]. Based on Figure 1.0 below, new prediction of DTIs happens when this method allows another reference ligand (nearest neighbour) to be found whenever a single ligand (active query) with known biological activity is used for searching process [8]. This reference ligand which are discovered after it is being screened against large number of database compounds will then bind to the same target as the query compound did and it is assumed as a potential drug [8]. The rational of this screening method is that true binders/drugs would share similar functional groups and/or geometric shapes given provided interacting hot spots within the binding site of the respective protein [9]. Despite possessing the edge when it comes in identifying a new drug, however similarity searching does have several disadvantages as well. First, this method depends on the availability of known ligands, which may be not heuristics in the earlier stages of the drug discovery process. In other words, it need at least one ligand compound in order to initiate its process [8]. Second, the similarity searching method which is based on the ligand similarity will have difficulties in identifying drugs with novel scaffolds that are contradict with those query compounds [10]. Last limitation that we identified on this technique is that it does not determine the binding position of the ligand compound within the binding site and the correlation binding score between the ligand and the protein [11]. The binding mode within the binding site is crucial in exploring the responsive mechanism between the protein and the ligand and the accuracy of the identified drug lead. The binding energy score, which relies on the forecast of correct binding modes, do play an important role as well when optimizing drug leads.

Knowledge Discovery in Databases (KDD) can be defined as the use of methods from domains such as machine learning, pattern recognition, statistics, and other related fields as to deduce knowledge from huge collections of data, where the respective knowledge is absence from the database structure [12]. Very large amounts of data are also characteristic of the databases of pharmaceutical companies, which has led to the growing use of KDD methods within the drug discovery process. However, lately researchers have diverted their interest to some other methodologies/ideas which can clarify in depth about molecular activity [12]. It is believed that those methods will not improve the prediction accuracy, but it still can assist the medicinal chemists in terms of developing the next marketable drugs [12]. This situation prompted different related techniques from KDD field being introduced to chemoinformatics, with one of them known as Association Rule Mining (ARM) [12]. ARM is a type of classification method that share the same properties with machine learning methods but slightly different in their primary aim as it focused on explanation rather than classification [12]. They focused on the features or group of features which may decide a particular classification for a set of objects [12]. Promising performance of ARM in several instances of target prediction has made it favourable in the case of predicting DTIs.


The information contains within activities classes ranging from heterogenous and homogenous category from ChEMBL database is important as it can be used to build the classification model. In our experiment, using that information we generate appropriate rules that will determine protein targets for a particular ligand. Each rule generated  were  based on  the  support and  confidence level  associate  with  them.  Support indicates how frequently  the items  appear in  the database.  While, confidence specify  the number of times the if/then statements have been found to be true. From the support and confidence scores obtained earlier, we select the best rules for the target prediction and these rules will be used to predict protein target for future ligands. However, the biggest challenge of ARM is that it’s a compute intensive procedures at the frequent itemsets generation. Hence, it is crucial that the execution is done on a high performance machine. At the moment we are lacking in high computing resources and this limit us to fully explore the capability of in relation to our objectives. Nevertheless, we have obtained results based on certain parameter ranges that would be present on the poster later.


Figure 1.0: Conventional similarity searching method used to predict new ligand that will interact with a particular target [8].



[1] L. Yu, X. Ma, L. Zhang, J. Zhang and L. Gao, “Prediction of new drug indications based on clinical data and network modularity”, Scientific Reports, vol. 6, no. 1, 2016.

[2] T. Ashburn, B. K. Thor, Drug Repositioning: Identifying and Developing New Uses for Existing Drugs. Nat. Rev. Drug Discovery, vol. 3, pp. 673−683,2004.

[3]  J.C.  Nacher,   J.M.  Schwartz,  Modularity  in  Protein  Complex  and  Drug  Interactions  Reveals  New Polypharmacological Properties. PLoS One, vol. 7, e30028, 2012.

[4] J. Peters, “Polypharmacology – Foe or Friend?”, Journal of Medicinal Chemistry, vol. 56, no. 22, pp. 8955- 8971, 2013.

[5]   “computational   drug   discovery:   Topics   by”,,   2017.   [Online].   Available: [Accessed: 06- Sep- 2017].

[6] T. Katsila, G. Spyroulias, G. Patrinos and M. Matsoukas, “Computational approaches in target identification and drug discovery”, Computational and Structural Biotechnology Journal, vol. 14, pp. 177-184, 2016.

[7] J. Auer, J. Bajorath. In: Keith J, editor. Bioinformatics. Humana Press, pp. 327–47,2008.

[8] P. Willett, J.M. Barnard, G.M. Downs. Chemical Similarity Searching. Journal of Chemical Information and Computer Sciences, vol. 38, pp. 983 – 996. 1998.

[9] S. Huang, M. Li, J. Wang and Y. Pan, “HybridDock: A Hybrid Protein–Ligand Docking Protocol Integrating Protein- and Ligand-Based Approaches”, Journal of Chemical Information and Modeling, vol. 56, no. 6, pp. 1078- 1087, 2016.

[10] N. Wale, I. Watson and G. Karypis, “Indirect Similarity Based Methods for Effective Scaffold-Hopping in Chemical Compounds”, Journal of Chemical Information and Modeling, vol. 48, no. 4, pp. 730-741, 2008.

[11] D. Mobley and K. Dill, “Binding of Small-Molecule Ligands to Proteins: “What You See” Is Not Always “What You Get””, Structure, vol. 17, no. 4, pp. 489-498, 2009.

[12] E. Gardiner and V. Gillet, “Perspectives on Knowledge Discovery Algorithms Recently Introduced in Chemoinformatics: Rough Set Theory, Association Rule Mining, Emerging Patterns, and Formal Concept Analysis”, Journal of Chemical Information and Modeling, vol. 55, no. 9, pp. 1781-1803, 2015.



Nurul Hashimah Ahamed Hassain Malim (Nurul Malim) received her B.Sc (Hons) in computer science and M.Sc in computer science from Universiti Sains Malaysia, Malaysia. She completed her PhD in 2011 from The University of Sheffield, United Kingdom. Her current research interests include chemoinformatics, bioinformatics, data analytics, sentiment analysis and high-performance computing. She is currently a Senior Lecturer in the School of Computer Sciences, Universiti Sains Malaysia, Malaysia.

Cybercriminal Personality Detection through Machine Learning

Dr Nurul Hashimah  Ahamed Hassain Malim1Saravanan Sagadevan1, Muhd Baqir Hakim1, Nurul Izzati Ridzuwan1

1 Universiti Sains Malaysia, Penang, Malaysia,



The development of sophisticated forms of communication technologies such as social networks has exponentially raised the number of users that participate in online activities. Although the development encourages and brings many  positive  social  benefits,  the  dark  sides  of  online  communication are  still  a  major  concern  in  virtual interactions. The dark side of online activities occasionally referred as cyber threats or cyber crimes. Over the past two decades, cybercrime cases have increased exponentially and threatened the privacy and life of online users. Occasionally, severe kinds of cyber criminal activities such as cyber bullying and cyber harassment executed through exploiting text messages and the anonymity offered by social network platforms such as Facebook and Twitter. However, the linguistic clues such as patterns of writing and expression in the text messages often act as fingerprints in revealing the personality traits of the culprits who hide behind the anonymity provided by social networks [1]. Personality traits are hidden abstraction that combined emotion, behavior, motivation and thinking patterns of human that often mirror the true characteristics of them through their activities that conducted intentionally or unintentionally [2]. In nature, each individual are differed in terms of their talking and writing styles or patterns and it is hard to observe those differences. The distinct styles of talking and writing are unique from person to person and there are tendencies to decipher the identity of writers by simply observed at the pattern of the writing especially the formation of words, phrases and clauses. Sir Francis Galton was identified as the first person that hypothesized natural language terms might present the personality differences in humankind [3]. Furthermore, Hofstee suggested that nouns, sentences, and actions might have some kind of connotations towards personality [4].  In  the other hand, since several decades ago, the people from forensic psychology, behavioral sciences and the law enforcement agencies have been working together to study and integrate the science  of  psychology into  criminal  profiling  [5].  Through  the  review  of  literature related  with  psychology, linguistics  and  behavior,  it  can  be  affirmed  that  strong  relationship  presented  between  personality  traits especially related with criminals and writing/language skills. Therefore, curiosity raised on whether the writing pattern in social networks by cyber criminals could be identify or detected by using automatic classifiers. If yes, how better will be the performances of the classifiers and what are the words or combination of words that may frequently used by cyber predators. Therefore, in order to find answers to those questions, we conducted an empirical investigation [9] (main study) with two other small scales studies [10,11] (extend the main study) by using the textual sources from Facebook and Twitter and exploiting the descriptions stated in Three Factor Personality Model, and sentiment valences. For the main study, the open source data Facebook [6] and Twitter [7] were used as text input while the data for other two small scale studies were harvested from Twitter using Tweepy, a Python library for accessing the Twitter API. The main study and the second study used data that only written in English language while third study used tweets in Malay Language (Bahasa Malaysia).  In these studies, we employed four main classifiers namely Sequential Minimal Optimization (SMO), Naive Bayes (NB), K- Nearest Neighbor (KNN) and J48 with ZeroR as baseline from Waikato Environment for Knowledge Analysis (WEKA) Machine Learning Tool. The reason to  used the traits  from Three Factor Model in  this  study is  due  to  the widespread use of the model in criminology, less number of traits ease the characteristics categorization process and large number of empirical proved that associated Psychoticism trait with criminal characteristics whereas sentiment valences was used to measure the polarity of sentiment terms.  The major traits of Three Factor Model and its associated characteristics listed in Table 1.

Table 1: Three Factor Model Traits and its characteristics [8].

Traits Specific Characteristics
Extraversion Sociable, lively, active, assertive, sensation seeking, carefree, and dominant.
Neuroticism Anxious, depressed, guilt feelings, low self-esteem, tense, irrational, and moody.
Psychoticism Aggressive, egocentric, impersonal, impulsive, antisocial, creative and tough-minded.

The three studies used similar research framework as following. Step 1 : Data Collection & Preprocessing (Data Cleansing, Stemming, Part-Of-Speech Tagging), Step 2 : Data Annotations, Step 3 : Automatic Classification by the four Classifiers, Step 4 : Performance analysis, criminal related terms identification (using Chi-Square method).  The following tables illustrated the performances of machine learning classifiers of the studies and the list of the terms that identified to be associated to criminal behavior. The class balancing method called Synthetic Minority Over-sampling Technique (SMOTE) was used to overcome the unbalance volume of class instances.

Table 2 : Accuracy of classifiers based on with/without SMOTE class balancing methods[10].

Performance measurement based on True Positive (TP)and False Positive (FP)
Type/Classifier ZeroR NB KNN SMO J48
Without SMOTE 47.2


52.73 58.18 41.82 47.27 52.73 72.73 27.27 78.18 21.82
With SMOTE 40.6


59.38 68.75 31.25 53.13 46.88 73.44 26.56 75.00 25


Table 3 : Accuracy of classifiers based on measuring the effect of class measuring[11].

Performance measurement based on True Positive (TP)and False Positive (FP)


3 53.3 46.7 80.0 20.0 63.3 36.7 73.3 26.7 50.0 50.0
5 53.3 46.7 90.0 10.0 56.7 43.3 70.0 30.0 63.3 36.7
10 53.3 46.7 90.0 10.0 56.7 43.3 86.3 16.7 70.0 30.0


Table 4 : Terms that highly associated with criminal behavior [9].

Facebook Twitter
Unigram Bigram Trigram Unigram Bigram Trigram
Damn The hell I want to Suck Damn It A big ass
Shit Damn it Damn it I Adore The hell A bit more
Fuck Hell i Is a bitch Annoy A bitch A bitch and
Hell My Fuck What the Fuck Asshole A damn A damn good
Ass The shit What the hell Shit A fuck All fuck up
Suck Damn you I feel like Fuck A hell A great fuck
Bad The fuck The hell I Hell Damn you A great night
Feel A bitch Cute A shit A pain in
Hate Fuck yeah Damn My ass A fuck off


As conclusion, our investigation showed that J48 performed better than other classifiers with and without applied the SMOTE class balancing technique and the effect of cross validation vary for each classifiers. However, in overall view, Naïve Bayes performed better on each cross validation experiments. This investigation also produced a list of the words that may used by cyber criminals based on language models specification. Then, for future study, we planned to used deep learning methods to analyze the contents related with cyber terrorism and welcome any collaboration for social networks cyber terrorism textual data collaboration.


  1. Olivia Goldhill.. Digital detectives: solving crimes through Twitter, 2013. The Telegraph.
  2. Navonil  Majumder,  Soujanya  Poria,  Alexander  Gelbukh,  and  Erik  Cambria.  2017.  Deep  learning  based document modeling for personality detection from text. IEEE Intelligent Systems 32(2):74–79.
  3. Sapir, Edward. Language: An Introduction to the Study of Speech. New York: Harcourt, Brace, 1921.
  4. Matthews, G, Ian, J. D., & Martha, C. W. Personality Traits (2nd edition). Cambridge University Press, 2003.
  5. Gierowski, J. K. Podstawowa problematyka psychologiczna w procesie karnym. Psychologia w postępowaniu karnym, Lexis Nexis, Warszawa 2010.
  6. Celli, F., Pianesi, F., Stillwell, D., & Kosinski, M. Workshop on Computational Personality Recognition (Shared Task). In Proceedings of WCPR13, in conjunction with ICWSM-2013.
  7. Alec, G., Richa, B, & Lei, H.. Twitter Sentiment Classification using Distant Supervision, 2009.
  8. Coleta, V. D, Jan, M. A., Janssens, M., & Eric E. J. PEN, Big Five, juvenile delinquency and criminal recidivism. Personality and Individual Differences, 39, (2005) 7–19. DOI:10.1016/j.paid.2004.06.016.
  9. Saravanan Sagadevan. Thesis : Comparison Of Machine Learning Algorithms For Personality Detection In Online Social Networking, 2017.
  10. Muhd, Baqir Hakim. Profiling Online Social Network (OSN) User Using PEN Model and Dark Triad Based on English Text Using Machine Learning Algorithm, 2017 (In Review).
  11. Nurul Izzati Binti Ridzuwan. Online Social Network User-Level Personality Profiling Using Pen Model Based On Malay Text (In Review), 2017.



Nurul Hashimah Ahamed Hassain Malim (Nurul Malim) received her B.Sc (Hons) in computer science and M.Sc in computer science from Universiti Sains Malaysia, Malaysia. She completed her PhD in 2011 from The University of Sheffield, United Kingdom. Her current research interests include chemoinformatics, bioinformatics, data analytics, sentiment analysis and high-performance computing. She is currently a Senior Lecturer in the School of Computer Sciences, Universiti Sains Malaysia, Malaysia.

Digital Earth Australia (DEA): From Satellites to Services

Mr Neal Evans1, Dr Trevor Dhu2, Mr David Gavin3, Dr David Hudson4, Mr Trent Kershaw5, Dr Leo Lymburner6, Ms Alla Metlenko7, Mr Norman Mueller8, Mr Simon Oliver9, Chris Penning10, Dr Medhavy Thankappan11, Ms Alicia Thomson12

1Geoscience Australia, Canberra, AUS,

2Geoscience Australia, Canberra, AUS,

3Geoscience Australia, Canberra, AUS,

4Geoscience Australia, Canberra, AUS,

5Geoscience Australia, Canberra, AUS,

6Geoscience Australia, Canberra, AUS,

7Geoscience Australia, Canberra, AUS,

8Geoscience Australia, Canberra, AUS,

9Geoscience Australia, Canberra, AUS,

10Geoscience Australia, Canberra, AUS,

11Geoscience Australia, Canberra, AUS,

12Geoscience Australia, Canberra, AUS,



The 2017/18 Budget identified over $2 billion of investments in monitoring, protecting or enhancing Australia’s land, coasts and oceans over the next four years including: the National Landcare Program; the Commonwealth Marine Reserves implementation; implementation of the Murray-Darling Basin Plan and water reform agenda; support for State and Territory governments to develop secure and affordable water infrastructure; improving water quality and scientific knowledge of the Great Barrier Reef.

Geoscience Australia’s efforts within this investment will be a program known as Digital Earth Australia (DEA) and will directly support these investments through the provision of an evidence base for the design, implementation and evaluation of policies, programs and regulation. It will also support Industry with access to stable, standardised data and imagery products from which it can innovate to produce new value added products and services.


DEA is an analysis platform for satellite imagery and other Earth observations. Today, it translates 30 years of Earth observation data (taken every two weeks at 25 metre squared resolution) and tracks changes across Australia in unprecedented detail, identifying soil and coastal erosion, crop growth, water quality, and changes to cities and regions. When fully operational, DEA will provide new information for every 10 square metres of Australia, every five days.

DEA uses open source standards, building off the international Open Data Cube technology which is supported by the Committee on Earth Observation Satellites (CEOS)1.



           Figure 1: WOfS Gulf of Carpentaria, QLD                                        Figure 2: Intertidal model over Exmouth Gulf, WA

Initial examples of how DEA will support government, industry and the research community through improved data include Water Observations from Space (WOfS), a continent-scale map of the presence of surface water; and the Intertidal Extents Model (ITEM) that consistently maps Australia’s vast intertidal zone to support coastal planning.

WOfS is already helping to improve the Australian Government’s understanding of water availability, historical flood inundation and environmental flows, while ITEM has yielded the first continent-wide tidal extent map for Australia and is being used by the Queensland government to assist in their intertidal and subtidal habitat mapping program.


DEA will benefit government departments and agencies that need accurate and timely spatial information on the health and productivity of Australia’s landscape. This near real-time information can be readily used as an evidence base for the design, implementation, and evaluation of policies, programs and regulation, and for developing policy advice.

DEA will also support agencies to better monitor change, protect and enhance Australia’s natural resources, and enable more effective responses to problems of national significance. Information extracted from Earth observation data will reduce risk from natural hazards such as bushfires and floods, assist in securing food resources, and enable informed decision making across government. Economic benefits are expected to be realised from better targeted government investment, reduced burden on the recipients of government funding, and increased productivity.

The DEA Program is developing joint projects to deliver products that address policy challenges across a range of Australian Government departments.


We invite you to be part of the future of DEA, as we build new products and tools to support Australian Government agencies to better monitor, protect, and enhance Australia’s natural resources.

Contact us to discuss how DEA can inform and support the work of your agency.




  1. Lewis, A., Oliver, S., Lymburner, L., Evans, B., Wyborn, L., Mueller, N., Raevksi, G., Hooke, J., Woodcock, R., Sixsmith, J., Wu, W., Tan, P., Li, F., Killough, B., Minchin, S., Roberts, D., Ayers, D., Bala, B., Dwyer, J., Dekker, A., Dhu, T., Hicks, A., Ip, A., Purss, M., Richards, C., Sagar, S., Trenham, C., Wang, P., L-W Wang, L-W., The Australian Geoscience Data Cube – Foundations and lessons learned, Remote Sensing of Environment (In Press).
  2. CEOS. Available from, accessed 28 Aug 2017
  3. Mueller, N., Lewis, A., Roberts, D., Ring, S., Melrose, R., Sixsmith, J., Lymburner, L., McIntyre, A., Tan, P., Curnow, S., Ip, A. Water observations from space: Mapping surface water from 25 years of Landsat imagery across Australia, Remote Sensing of Environment 174, 341-352, ISSN 0034-4257.
  4. Sagar, S., Roberts, D., Bala, B., Lymburner, L., 2017. Extracting the intertidal extent and topography of the Australian coastline from a 28 year time series of Landsat observations.Remote Sensing of Environment 195, 153–169.
  5. GA eCat Record, created 28 Aug 2017

Recent Comments

    About the conference

    eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

    Conference Managers

    Please contact the team at Conference Design with any questions regarding the conference.

    © 2018 - 2019 Conference Design Pty Ltd