Accelerating HPC innovation for today and tomorrow

Steve Tolnai1

1Hewlett Packard Enterprise, HPC & AI Lead, Asia Pacific & Japan

 

System developers, scientists and researchers face obstacles deploying complex new HPC technologies, such as: energy efficiency, reliability and resiliency requirements and developing software to exploit HPC hardware. All can delay technology adoption and critical projects. The requirement to accelerate real-time insights and intelligence for deep learning with innovations is growing at break-neck speed. HPE & Intel have forged a multi-faceted alliance to advance customer innovation and expand HPC accessibility to enterprises of all sizes. Join this session to discover how HPE’s ecosystem of industry partnerships is delivering breakthroughs in HPC for deployment, security, power use and density, to make supercomputing more accessible and affordable for today and tomorrow.

Spatial Performance Environment Command Transmission Realities for Astronauts (SPECTRA)

Sarah Jane Pell1

1Artist-Astronaut, Adjunct Assoc. Professor, Faculty of Engineering, Office of the Engineering Dean, and Faculty of Art, Design and Architecture, Monash University

 

Sarah Jane Pell has performed with gesture-controlled robots underwater, dragged prototype 360° cameras up Mt. Everest, launched artworks into space, and bounced images on the Moon’s surface via radio waves. The real work is in recognizing the signals as a dialogue with the extreme environment.

In her 2018 eResearch Australasia Keynote, Pell will explore the role that art and technology has played and continues to play in shifting understandings of human exploration. What are the stakes—social, ethical, ontological— in appropriating astronautics for artistic purposes for example? What are the consequences, both intended and not, of placing artworks/artists into diverse cultural contexts, from the space analogue to the launch vehicle? What would it take to RSVP YES to #DearMoon?

An Australia Council Fellowship in Emerging and Experimental Arts supported her latest Performing Astronautics projects. She highlights the impact of cinematic-robotic and immersive visualization technology vital to understanding and assisting human movement, including working as an artist astronaut. Pell is uniquely positioned as a commercial diver, commercial spaceflight candidate, and spacesuit validation test pilot. She qualified for a polar suborbital mission specialist, and served as the Simulation Astronaut for Project Moonwalk subsea lunar analogue human-robotic co-operation trials 2016; Artist-in-Residence for the Mars Desert Research Station Crew188, and Commander of Lunares 3 Crew Spectra Mission 2018. Former Chair of the European Space Agency (Topical Team Art & Science), graduate of the International Space University, and NASA consultant, Pell champions sci-art approaches and human spaceflight as priorities for the new Australian Space Agency.

Harnessing The Wisdom of Crowds: The SWARM Project

Prof. Richard O. Sinnott1 & the SWARM Team1

1University of Melbourne, rsinnott@unimelb.edu.au

 

The Smartly-assembled,  Wiki-style Argument Marshalling (SWARM) project was funded by the   US   Intelligence   Advanced   Research   Projects   Activity   (IARPA)   as   part   of   the Crowdsourcing Evidence, Argumentation, Thinking and Evaluation (CREATE) program. The project  formally  commenced  in January  2018 and has been awarded  (up to $19m).  The SWARM is one of 4-projects funded globally through the IARPA CREATE program. These projects are tasked with suporting improved reasoning to aid the intelligence community by leveraging the wisdom of crowds. Whilst previous IARPA programs have demonstrated the benefits in leveraging the wisdom of crowds to get improved answers, the actual reasoning and deliberation in what makes a good answer remains unclear. This is the primary goal of SWARM.

The evaluation of the SWARM platform and the other platfoms is currently being undertaken by an independent crowd managed by IARPA and their Test & Evaluation team. This crowd will be organised into separate teams. Each team will be assigned a set of questions that require reasoning and evaluation to come up with the most highly regarded answers over several months, i.e. those answers (hypotheses) with the best reasoning and presentation.

This presentation will cover the overarching goals of SWARM and the underpinning technical solutions that have been developed. This includes the mobile applications that have been developed  to encourage  crowd  participation.  The talk will also briefly  include  early (non- funded SWARM work) exploring the extent that deep learning approaches can be used for automation of the assessment of collective reasoning.


Biography:

Professor Richard O. Sinnott is the Director of eResearch at the University of Melbourne and Chair of Applied Computing Systems. In these roles he is responsible for all aspects of eResearch (research-oriented IT development) at the University. He has been lead software engineer/architect  on  an  extensive  portfolio  of  national  and  international  projects,  with specific focus on those research domains requiring finer-grained access control (security). He has over 300 peer reviewed publications across a range of applied computing research areas.

Laying the foundation for Australian participation in international eResearch networks in Solid Earth and Environmental Science

Simon Cox1, Erin Robinson2, Adrian Burton3, Ben Evans4, Lesley Wyborn5, Tim Rawling6

1CSIRO Land and Water, Melbourne, Australia, simon.cox@CSIRO.au

2Earth Science Information Partners, Boulder, USA, erinrobinson@esipfed.org

3Australian Research Data Commons, Canberra, Australia, adrian.burton@ardc.org.au

4National Computational Infrastructure, Canberra, Australia, ben.evans@anu.edu.au

5National Computational Infrastructure, Canberra, Australia, lesley.wyborn@anu.edu.au

6AuScope Ltd, Melbourne, Australia, tim.rawling@unimelb.edu.au

 

Globally, a significant amount of government investments in Australia, USA and Europe are building solid Earth and environmental science eResearch infrastructures to facilitate the next generation of transdisciplinary research to address pressing geoscience and environmental science issues within the constraints of social impacts and sustainable development. These eResearch investments are developing best practices for both cyberinfrastructure development and data issues such as data management/stewardship, vocabularies and common data services. In more recent years there has also been an additional emphasis for both data collection and software development projects that are publicly funded to be in line with the FAIR principles (Findable, Accessible, Interoperable and Reusable) Wilkinson et al. [1]. The FAIR principles are not easy to implement, particularly those for interoperability and many find them challenging.

Fortunately, the rules for almost all research funding schemes are that developments, wherever possible, will be open source and that access will primarily be on merit. These rules create an ideal environment to share any developments in software, tools, data services, vocabularies, etc., particularly as general solid Earth and environmental science data and supporting infrastructures have many common and exploitable patterns that cross institutional, community, national and continental boundaries.

Some of the existing major solid Earth and Environmental eResearch Infrastructure initiatives in Australia, USA and Europe include:

  • AuScope, an Australian National Collaborative Research Infrastructure Strategy (NCRIS) funded capability which provides research infrastructure to the Australian solid Earth science communities;
  • The Australian Integrated Marine Observing System (IMOS) NCRIS capability has a portfolio of ten facilities that undertake systematic and sustained observing of Australia’s vast marine areas. Any data collected by these facilities can be discovered, accessed, and downloaded via the Australian Ocean Data Network;
  • The Australian Terrestrial Ecosystems Research Network (TERN) NCRIS capability provides open access for researchers to Australia’s land-based ecosystem monitoring infrastructure, data and research tools and thus helping to contribute to a broader understanding and long-term sustainable management of Australia’s ecosystems over three key themes of measurement: biodiversity, carbon & water, and land & terrain;
  • The National Computational Infrastructure (NCI), partly funded by NCRIS, with operational funding provided through a formal collaboration with CSIRO, the Bureau of Meteorology, The Australian National University, Geoscience Australia, the Australian Research Council and others. NCI has built a major integrated research data platform (10+ PBytes) of national reference data collections spanning climate, coasts, oceans, and geophysics through to astronomy, bioinformatics, and the social sciences domains. This platform is connected to a 1.7 PFlop HPC and services managed on tightly-integrated high-performance cloud infrastructure to support the next generation of data-intensive science;
  • The Australian Research Data Commons (ARDC), established in 2018, is building on and strengthening the work of ANDS, Nectar, and RDS that supported Australian researchers across multiple domains with the Data Enhanced Virtual Laboratory and Research Data Cloud programs. ARDC will focus on partnering with other capabilities to improve modern data-intensive, cross-disciplinary research in Australia within the context of global collaborative research. ARDC has a particular focus on sector-wide collaborative action, information sharing and community building;
  • Earth Science Information Partners (ESIP), a US-based independent forum for the Earth science data and technology communities which has built a community of practice in USA, supported by NASA, NOAA and the USGS, to addresses topics such as data stewardship, data citation and documentation. It has become a brains trust and professional home for the Earth Science data and informatics community where both peer-led education and training and the co-development of conventions, practices and guidelines have helped make Earth science data more interoperable;
  • EarthCube was initiated by the US National Science Foundation (NSF) in 2011 to transform geoscience research by developing cyberinfrastructure to improve access, sharing, visualization, and analysis of all forms of geosciences data and related resources. As a community-governed effort, EarthCube’s goal is to enable geoscientists to tackle the challenges of understanding and predicting a complex and evolving solid Earth, hydrosphere, atmosphere, and environment systems. An important project is the EarthCube Council for Data Facilities which seeks increased coordination, collaboration, and innovation in the acquisition, curation, preservation, and dissemination of geoscience data, tools, models, and services across existing and emerging geoscience data facilities;
  • European Plate Observing System (EPOS) is a European Union’s (EU) Horizon 2020 research and innovation program which supports integrated use of data products and facilities from distributed research infrastructures for European solid Earth science. EPOS brings together Earth scientists, national research infrastructures, ICT experts, decision makers, and the public to develop new concepts and tools for addressing questions concerning geo-hazards and those geodynamic phenomena relevant to the environment and human welfare; and
  • ENVRIplus, also an EU Horizon 2020 project, which brings together environmental and Earth system research infrastructures, projects and networks together with technical specialist partners to create a more coherent, interdisciplinary and interoperable cluster of environmental research infrastructures across Europe.

There are many parallels across these nine existing eResearch Infrastructure initiative initiatives, but currently they are somewhat disconnected. Each is focused more on national/continental scale issues, in part because most funding initiatives are nationally generated. Clearly there are common technological and science challenges that each is trying to solve in isolation and although standards, vocabularies, formats, etc., are cohesive within each community, there are sufficient differences that make it hard to integrate data across them.

The time is ripe to synchronise efforts to create globally connected networks of solid Earth and environmental science data, information infrastructures, software and researchers with a goal of making scarce eResearch funding more effective by reducing duplication, increasing efficiency, and promoting partnerships and adoption across communities initially within the solid Earth and environmental sciences and then potentially to other domains. Already there are embryonic proposals to create integrated international networks to coordinate and harmonize these efforts. If Australia is to be part of the trend towards globalisation of high-quality solid Earth and environmental research projects then it needs to be a key collaborator in their development, to ensure Australian perspectives are included.

Already the recently formed ESIP/RDA Earth Space and Environmental Sciences Interest Group is starting to coordinate and harmonize efforts internationally in the international Solid Earth and Environmental research community. In Australia, associated satellite activities such as the Australian ESIP downunder (E2SIP), has formed an ESIP cluster in collaboration with the National Earth and Environment Sciences Facilities Forum.

An additional consideration is how to extend efforts from the research sector into government and industry initiatives and create a truly global network of solid Earth and environmental science data infrastructures to underpin fundamental research into global geoscience processes within the context of societal impacts and sustainable development. Currently many government/industry initiatives are poorly connected to equivalent activities in the research sector.

REFERENCES

  1. Wilkinson, M.D., Dumontier, M., Aalbersberg, IJ.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J-W., Silva, Santos L.B. da, Bourne, P.E., Bouwman, J., Brookes, A.J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C.T., Finkers, R., Gonzalez-Beltran, A., Gray, A.J.G., Groth, P., Goble, C., Grethe, J.S., Heringa, J., Hoen, P.A.C. ‘t, Hooft, R., Kuhn, T., Kok, R., Kok, J.N., Lusher, S.J., Martone, M.E., Mons, A., Packer, A.L., Persson, B., Rocca-Serra, P., Roos, M., Schaik, R. van, Sansone, S-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M.A., Thompson, M., Lei, J. van der, Mulligen., E. van, Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K.J., Zhao, J., Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3. https://doi.org/10.1038/sdata.2016.18 Accessed 18 August, 2018.

Biography:

Simon has been researching standards for publication and transfer of earth and environmental science data since the emergence of the world wide web. Starting in geophysics and mineral exploration, he has engaged with most areas of environmental science, including water resources, marine data, meteorology, soil, ecology and biodiversity. He is principal- or co-author of a number of international standards, including Geography Markup Language, and Observations & Measurements. The value of these is in enabling data from multiple origins and disciplines to be combined more effectively, which is essential in tackling most contemporary problems in science and society. His current work focuses on aligning science information with the semantic web technologies and linked open data principles, and the formalization, publication and maintenance of controlled vocabularies and similar reference data.

Dr Cox is the author of over 40 journal articles, 30 technical specifications and international standards, and 150 conference papers.

Coordinated identifier infrastructures enabling Geoscience researchers to meet future directions in scholarly communications

Natasha Simons1, Julia Martin2, Mingfang Wu3, Adrian Burton4, Jens Klump5, Keith Russell6, Gerry Ryder7, Lesley Wyborn8, Tim Rawling9

1Australian Research Data Commons, Brisbane, Australia, natasha.simons@ardc.edu.au    

2Australian Research Data Commons, Canberra, Australia, Julia.Martin@ardc.edu.au

3Australian Research Data Commons, Melbourne, Australia, Mingfang.Wu@ardc.edu.au

4Australian Research Data Commons, Brisbane, Australia,  Adrian.Burton@ardc.edu.au

5CSIRO Mineral Resources, Perth, Australia, jens.klump@csiro.au

6Australian Research Data Commons, Melbourne, Australia, Keith.Russell@ardc.edu.au

7Australian Research Data Commons, Adelaide, Australia, gerry.ryder@ardc.edu.au

8 National Computational Infrastructure, ANU, Canberra, Australia, Lesley.Wyborn@anu.edu.au

9AuScope, Melbourne, Tim.Rawling@unimelb.edu.au

 

INTRODUCTION

In modern research, much of geoscience and equivalent investigations in the environmental sciences are based on observations and measurements of real-world phenomena which can range from simple visual observations on small hand sized physical samples to voluminous ex-situ measurements made using satellite or laboratory/sensor instruments. Information on samples, digital data and computational methods is rarely captured in traditional publications. Fifty years ago, most data that underpinned a scholarly publication could be represented in typeset tables, but with the advent of the digital age and the computerisation of instruments, the volumes of data collected became too large to present as tables within a paper. Data then at best became included as a supplement to the paper accessible by contacting the journal, or else could be obtained ‘by contacting the author’. Such approaches limit the ability to test the veracity and reproducibility of a publication and do not guarantee accessibility and persistence of input research artefacts into the future, nor do they ensure the capability of them being reused for purposes beyond the original use case. The Geoscience Paper of the Future was recently proposed to enable researchers to fully document, share, and cite all their research products including physical samples, data, software, and computational provenance [1] and at about the same time, the Findable, Accessible, Interoperable and Reusable (FAIR) Principles [2] emerged. Today, publishers do not have a consistent way of citing data underpinning a publication whilst details on how to reference/access physical specimens or software are rarely provided. Also, the interpretations of the FAIR principles can be quite inconsistent.

To address this complex issue, in 2017, a grant from the American Laura and John Arnold Foundation was awarded to the American Geophysical Union (AGU) and other partners (including AuScope, National Computational Infrastructure, the Australian Research Data Commons) to significantly improve the interconnection of data, samples, software and literature in the Earth and space sciences, based around the FAIR principles. The key objectives of the project are that:

  1. Publishers will follow consistent policies for sharing and citing data, samples and software used in the scholarly literature and will move from having these as supplements to the publication to using trusted repositories for publishing supporting research artefacts;
  2. Open repositories for Earth and environmental sciences will enable those policies and other data applications by providing persistent identifiers, rich metadata, and related services for the data, software and samples they hold;
  3. Geoscience researchers will know how to consistently share, document, and reference data, samples and software and use globally persistent identifiers to uniquely identify their research outputs.

These objectives finally provide a response to the inevitable change required in scholarly communication driven by the emergence of computers and the dawning of the age of digital data collection and curation fifty years ago, followed by the need for more complex software to process ever-increasing data volumes. However effective implementation will require a significant cultural change in today’s research practices, many of which come from the pre-digital era. A critical component of the AGU-led project is promoting the value of citation with identifiers to researchers so that they know how to effectively use them in publications and ensure credit is acknowledged when credit is due.

PROMOTING THE VALUE OF IDENTIFIERS TO RESEARCHERS

Although identifiers have been commonplace for scholarly publications for some time and most Australian researchers have an ORCiD, few realise the power of using equivalent identifier systems for all their research artefacts including physical samples, software and data.

1.       Advantages of using Samples Identifiers

The International Geo Sample Number (IGSN), used on 5 continents to uniquely identify physical samples, allows researchers to firstly gain credit for sample collection and preparation, and secondly enable them to trace where other analytical work is published on samples that they collected and curated. As the usage of IGSN grows it will also be possible to locate other samples from the same geographical features (e.g. a borehole or a remote island) to obtain a more complete overview of where new data generated by a researcher relates to existing data in the literature. Likewise, funders can trace where a sampling project they funded has resulted in high impact publications.

2.       Advantages of Using Software Identifiers

Proper use of identifiers and citation for software means that a researcher can trace where their software has been used by others in publications and acknowledged for this work. Further, by being able to search registers of appropriately described and cited software, researchers can also reduce the ‘Time to Science’ as they do not waste time rewriting complex code that already exists.

3.       Advantages of Using Identifiers for Datasets that Underpin Publications

Increasingly the use of unique identifiers for data and proper citation of that data is being used for career advancement. For example, through linking of identifiers, a researcher is able to track usage of any of their datasets used in a high impact paper by other researchers and gain credit. In addition, a persistent identifier such as a DOI ensures long-term access to the dataset for enabling reproducibility of the current research and reuse for new research directions.

CURRENT ARDC INFRASTRUCTURES TO PERSISTENTLY IDENTIFY RESEARCH ARTEFACTS

Once researchers embrace the need for identifiers as part of their research ecosystem, they must have access to infrastructures that enable the persistent and unique identification of, and access to their research artefacts throughout their career and beyond. Over the last 10 years, the Australian Research Data Commons (ARDC) and its predecessors have been building an infrastructure for data citation which assists researchers to enable FAIR publication of data and ensure proper recognition and citation of their data in their own and any subsequent publications that also use their data. Details are available on https://www.ands.org.au/working-with-data/citation-and-identifiers/data-citation.

In the recent ARDC/AuScope/NCI funded Geosciences Data-enhanced Virtual Laboratory project, the ARDC has been working with the Geoscience community to develop equivalent persistent identifier systems for samples and software. Australian geoscience researchers can obtain access to IGSNs for their physical samples (specimens) here: http://www.auscope.org.au/igsn-info/ and information about citation for physical samples is here: http://www.ands.org.au/working-with-data/citation-and-identifiers/igsn. An ARDC guide for software citation is available here: https://www.ands.org.au/working-with-data/citation-and-identifiers/software-citation.

Combined, these efforts will ensure that Australian Geoscience researchers can meet the new demands that are now emerging from the Earth and space science publishers and enable moving towards the Geoscience Paper of the Future. The ARDC identifier systems recently developed for physical samples and software are easily portable to other physical sciences such as the environmental, marine and bio domains and will help ensure that research artefacts will be Findable, Accessible for current and future generations of researchers and Reusable for purposes beyond which they were collected for. It is accepted that Interoperability will still take some time, but plans are already being developed.

REFERENCES

  1. Gil, Y., David, C.H., Demir, I., Essawy, B.T., Fulweiler, R.W, Goodall, J.L., Karlstrom, L., Lee, H., Mills, H.J., Oh, J.H., Pierce, S.A., Pope, A., Tzeng, M.W., Villamizar, S.R., and Yu, X., 2016. Toward the Geoscience Paper of the Future: Best Practices for Documenting and Sharing Research from Data to Software to Provenance. Earth and Space Science, 3, 388-415. https://doi.org/10.1002/2015EA000136 Accessed 18 August 2018.
  2. Wilkinson, M.D., Dumontier, M., Aalbersberg, IJ.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J-W., Silva, Santos L.B. da, Bourne, P.E., Bouwman, J., Brookes, A.J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C.T., Finkers, R., Gonzalez-Beltran, A., Gray, A.J.G., Groth, P., Goble, C., Grethe, J.S., Heringa, J., Hoen, P.A.C. ‘t, Hooft, R., Kuhn, T., Kok, R., Kok, J.N., Lusher, S.J., Martone, M.E., Mons, A., Packer, A.L., Persson, B., Rocca-Serra, P., Roos, M., Schaik, R. van, Sansone, S-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M.A., Thompson, M., Lei, J. van der, Mulligen., E. van, Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K.J., Zhao, J., Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3. https://doi.org/10.1038/sdata.2016.18 Accessed 18 August 2018.

Biography:

Natasha Simons is a Research Data Management Specialist with the Australian National Data Service. Located at Griffith University in Brisbane, Natasha serves on the Council of Australian University Librarians Research Advisory Committee and is an ORCID Ambassador. She is an author and reviewer of papers related to library and information management and co-authored a 2013 book on digital repositories. Natasha was the Senior Project Manager for the Griffith Research Hub, which won awards from Stanford University and VALA. She is an advocate for open data, open repositories and ORCID.

A game changer in the paradigm of the silo-based service delivery – The integrated project delivery of a secure digital data capture for clinical trials at University of Sydney

Dr Daniele Vicari1, Jessica  Cornock2

1Research Portfolio – SIH, University of Sydney, Sydney, Australia, daniele.vicari@sydney.edu.au

2ICT, University of Sydney, Sydney, Australia, jessica.cornock@sydney.edu.au

 

Abstract

With no unified system to collect clinical trial data, researchers were often making use of laborious workflow including convoluted spreadsheets and paper questionnaires. Aiming to provide a robust and supported data capture and survey capability for researchers, the Information and Communication Technology (ICT) and Research Portfolio started a collaboration in 2017 and ran a joint project enhancing the University’s installation of REDCap.

REDCap is an established, secure web application for building and managing online surveys and databases. It is ideal for collecting and managing participant data with features supporting longitudinal data collection, complex team workflows and exports to a range of statistical analysis programs.

Description

Most Universities have been challenged to deliver the best infrastructure in the rapidly evolving research ecosystem with often each sector working independently. Thus, creating a gap between the end users and professional/operational teams. To break the silo-based paradigm, ICT teamed up with other professional services in the research portfolio and took an Integrated Project Delivery method to implement a digital data capture platform among clinical trial and other studies. We will present here the strategy and outcomes of this initiative. The agile and joint decision-making allowed several key capabilities to be delivered from 2017 to 2018. For example:

  • REDCap was updated from version 5.x to 7.4 (now 8.3) providing 3 years-worth of development updates.
  • REDCap is now available to any researcher with an Australian Access Federation (AAF) identity. Increasing the ability for collaboration and decreasing the administrative burden on ICT support.
  • REDCap is now cloud-hosted on AWS which has made it a high-availability system with expectations of minimal unscheduled down time.
  • In-house training and one-to-one consultations allowing the platform to be used widely including small clinical trials and other research projects (highlighted in figure 1).
  • Most importantly, a collaborative technical and business support structure has been established across ICT, Digital Research Support and Clinical Trials support.

Figure 1: REDCap project types in University. Translational research 1 (applying discoveries to the development of trials and studies in humans). Translational research 2 (enhancing adoption of research findings and best practices into the community).Repository (developing a data or specimen repository for future use by investigators)

Conclusion

All of these enhancements have led to a significant increase in the number of users and projects utilising this system.

The initiative of integrating teams such as ICT, Digital Research Support and Clinical Trials support demonstrated an effective and agile approach to deliver a clinical trial capture data tool while allowing many other researchers to make use of this platform across the diverse disciplines.  Researchers are able to collect their data in a structured database with proper access control (university credentials) and rigorous audit trail. Thus, changing the culture of data collection while improving research integrity and compliance.

This Integrated Project Delivery method established an active teamwork which is working now in improving the maintenance and feature enhancement of REDCap including automated upgrades, integration with other systems, and customised project developments.

In addition, the service delivery can be rapidly optimized once the research support serves as a conduit between the researchers (end users) and the service provider (ICT).


Biography:

Daniele has worked as biochemistry researcher in the past and she kept her passion for teaching/training while working in several educational institutions in Brazil, USA, Switzerland, and Australia. Taking advantage of her extensive experience as mentor and teacher, she is currently supporting, training and advising staff and student researchers in how to use the digital tools to achieve best research data management practices in the University of Sydney.

Hacky Hour: Demographics, Building Communities and Sharing Local Expertise

Amanda Miotto1, Nick Hamilton2

1Griffith University/QCIF, Brisbane, Australia, a.miotto@griffith.edu.au
2University of Queensland, St Lucia, Australia

 

Description

Researchers starting their journey through data science often have an ambiguous path to follow. While online data science classes are plentiful, it can be challenging for those researchers, who have often never seen programming code before, to know where to start or how apply methods to their own data.

In Queensland, many of the universities, including Griffith, UQ, QUT USQ and JCU, have been supporting these researchers by running ‘Hacky Hours’; an open session where researchers can meet research software engineers and other researchers doing similar work to share knowledge, ask questions freely and come together to work on projects in a friendly environment.

Hacky hour groups are often successfully paired with workshops such as Software Carpentry to compliment learning after an initial introduction to a programming language or skill. This is also a good segway into discussions about practical reproducible practices (such as version control with Git and naming conventions) and data management (such as backups and data sensitivity).

These community building groups connect researchers often left in silos and expand expertise in the university. People come along both looking for help and offering to volunteer their time. Some clients even come along to socialize and meet others in their universities.

Hacky Hour communities have also been a way to connect with the wider research and technical communities, providing links to relevant meetups, hackathons, workshops offered outside their university and national resources such as NeCTAR cloud compute and virtual labs, local High Performance Computing (HPC) and other NCRIS activities. As many researchers can operate in isolated silos, this can often be the first time clients learn about these resources and initiatives. This also leads to attendees becoming involved with the wider community and building networks nationally.

The coordinators of Hacky Hours also work together to build a larger community. Coordinators share ideas for events, resources that are relevant and lessons learnt. This has been collated into a ‘Hacky Hours’ handbook available here: https://github.com/amandamiotto/HackyHourHandbook

Over the past three years, demographics have been collected across Queensland to show the diversity of the audience and the inquiries. Trends are studied to shape more targeted approaches. For example, UQ now hosts monthly bioinformatics specific Hacky Hours. The poster submitted highlights demographics and trends in these sessions


Biography:

Amanda Miotto is an eResearch Senior Analyst for Griffith University and QCIF. She started off in the field of Bioinformatics and learnt to appreciate the beauty of science before discovering the joys of coding. As well as working on platforms around HPC, microscopy & scientific database portals, she is also heavily involved in Software Carpentry, Hacky Hours and was the main organizer for Research Bazaar Brisbane 2018.

HPC Software Image Test

Gerard Kennedy1, Ahmed Shamsul Arefin2, Steve McMahon3

1Research School of Engineering, ANU, Canberra, Australia
2Scientific Computing, IM&T, CSIRO Canberra, Australia
3IM&T, DST Group Canberra, Australia

 

Introduction

In this work, we present a Software Image Test (SIT) tool that can test the software image of a node or nodes in a HPC cluster system. The script comprises of a collection of BATS tests that run in an automated SLURM job and the outcomes are sent to the executing user via email. The results help to decide if the software image is ready for rolling on the production cluster.

Development

BATS (Bash Automated Testing System) [1] is a TAP (Test Anything Protocol) [2] compliant testing framework for Bash. It provides a simple way to verify the functionality of the executing programs. The test uses BATS files, which are essentially Bash scripts with special syntax for defining the test cases. If every command in a test case exits with a 0 status code (success) the test is considered as passed. See an example run below:

The TAP has implementations in C, C++, Python, PHP, Perl, Java, JavaScript, and others. We have chosen the Bash version due to its simplicity and matching skillsets available across our teams.

Figure 1: BATS tests developed for the HPC Software Image Testing.

With the syntax demonstrated above, we have developed a number of BATS tests (see Figure 1):

  • nvidia.bats: This script contains tests for the node GPUs. It uses Nvidia Validation Suite [3], where the outcomes help to quickly check the CUDA configuration/ setup, ECC enablement, etc. It runs Deployment, Memory, and PCIe/Bandwidth tests, giving a quick overview of the main components of the GPUs.
  • intel.bats: This test runs the Intel Cluster Checker [4]. This package requires ‘config.xml’, ‘packagelist.head’, ‘packagelist.node’ and ‘nodelist’ files setup to execute successfully. The ‘Config.xml’ determines the modules that will be tested, and can be altered if the user wishes. Some examples of the modules tested: ping, ssh, infiniband, mpi_local, packages (uses packagelists), storage, etc. This test requires multiple nodes to run on.
  • benchmark.bats: This test runs the Intel MPI Benchmark [5], which helps to ensure that MPI has been correctly configured on the node(s) in question. This test requires multiple nodes to run on.
  • apoa.bats: This test uses the NAMD [6] ApoA1 simulation and tests OpenMP, MPI, CUDA, etc. configurations.

Further to these tests, we have developed scripts a few more essential tests, e.g., checking the storage mounts, SLURM partitions, ssh host-keys, etc.

Execution

In order to execute the SIT script, user must provide a valid set of input arguments. The possible input arguments are; Partition: The SIT script runs as a batch job, therefore the user needs define the partition in which the node or the set of nodes are located. If the nodes that we wish to test are spread across multiple partitions, need to enter the partitions as a comma separated list. Node(s): The user can input as many nodes as they wish.

Here are four examples of valid initialization commands and input argument combinations:

Results

The SIT sends an email to the executing user as shown in the Figure 2. As the tests outcomes are sent as an email, users do not need to wait on the console. Based on the results, we further tune the software image as required.

Figure 2: SIT outcomes are sent as an email when the job is finished.

Conclusions and future works

We have devised a TAP-based tool that can quickly check the suitability of a software image before rolling it onto the production cluster nodes. The script as demonstrated above is simple, but robust enough to accommodate as many factors we wish to test. Our future plan includes to create a GUI, possibly web version where user add/remove tests and get outcomes visually. We are also aiming to use the Nvidia’s DCGM tool which has recently replaced the validation suit.

References

  1. Stephenson, S., “BATS”,  https://github.com/sstephenson/bats
  2. Test Anything Protocol  http://testanything.org/
  3. Nvidia Validation Suit  http://docs.nvidia.com/deploy/nvvs-user-guide/index.html
  4. Intel cluster checker https://software.intel.com/en-us/intel-cluster-checker
  5. Intel MPI Benchmark  https://software.intel.com/en-us/articles/intel-mpi-benchmarks
  6. NAMD https://www.ks.uiuc.edu/Research/namd/

Biographies:

Gerard Kennedy: Gerard is working as a Research Assistant at the Research School of Engineering, ANU and the Australian Centre for Robotic Vision. He is developing an asparagus-picking robot and involved with the robot’s perception system, which includes areas such as camera calibration, image segmentation and 3D reconstruction. He has a B.E in Mechatronics, Robotics and Systems Engineering from the Australian National University.

Ahmed Arefin: Ahmed works within the High Performance Computing Systems Team at the Scientific Computing, IM&T, CSIRO. He has done his PhD and Postdoc in the area of HPC & parallel data mining from the University of Newcastle and he published articles in PLOS ONE and Springer journals and IEEE sponsored conference proceedings. His primary research interest focuses on the application of high performance computing in data mining, graphs/networks and visualization.

Steve McMahon: Steve McMahon is an IT professional with a strong background in science, software development and IT service delivery. He understands the IT infrastructure needs of scientists and has worked with many. He has worked on negotiating, designing and establishing IT infrastructure for several large scale science projects. He has done major software development in the fields of computational fluid dynamics and biophysics simulation. He was integral in planning and implementing a broad range of data services for the federally funded Australian Research Collaboration Service (ARCS).  Steve is currently working as the Engineering Manager for HPC and Computational Sciences at the DST Group.

The Climate Data Enhanced Virtual Laboratory (Climate DEVL): Enhancing climate research capabilities in Australia

Kate Snow1, Clare Richards2, Aurel Moise3, Claire Trenham4, Paola Petrelli5, Chris Allen6, Matthew Nethery7, Sean Pringle8, Scott Wales9, Ben Evans10

1Australian National University, Canberra, Australia, kate.snow@anu.edu.au
2Australian National University, Canberra, Australia, clare.richards@anu.edu.au
3Bureau of Meteorology, Melbourne, Australia, aurel.moise@bom.gov.au
4Commonwealth Scientific and Industrial Research Organisation (CSIRO), Aspendale, Australia, claire.trenham@csiro.au
5University of Tasmania and ARC Centre of Excellence for Climate Extremes, Hobart, Australia, paola.petrelli@utas.edu.au
6Australian National University, Canberra, Australia, chris.allen@anu.edu.au
7Australian National University, Canberra, Australia, matthew.nethery@anu.edu.au
8Australian National University, Canberra, Australia, sean.pringle@anu.edu.au
9University of Melbourne and ARC Centre of Excellence for Climate Extremes, Melbourne, Australia, scott.wales@unimelb.edu.au
10Australian National University, Canberra, Australia, ben.evans@anu.edu.au

 

A major focus of the Australian climate research community currently is the preparation for and contribution to the World Climate Research Programme (WCRP) Coupled Model Intercomparison Project phase 6 (CMIP6). CMIP6 is an internationally coordinated research activity that provides climate model output from a series of carefully designed and targeted experiments. The analysis of CMIP6 data will form the basis for assessments by the Intergovernmental Panel on Climate Change (IPCC) and inform policy- and decision-makers around the world.

For Australia, CMIP6 will underpin research into historical climate variability as well as future projections research into the timing, extent and consequences of climate change and extreme events. This work may be used to assist Australian government, business, agriculture and industry to manage climate risks and opportunities related to climate variability, change and extremes.

Climate research is computationally-demanding and requires data-intensive High Performance Computing (HPC). More than 20 PBytes of CMIP6 data are expected globally, the largest collection of climate data ever produced, of which a substantial portion will be made available and analysed at NCI. The complexity and volume of CMIP6 means that data management is an impossible task without a national infrastructure approach and deeply collaborative effort, and NCI is an essential component in realising climate research in Australia. The Climate DEVL addresses the software- and data-management aspects of these needs, while NCI and the leaders from the Climate community work to secure funding for sufficient data storage infrastructure needed for the CMIP6 endeavour.

The Climate Data Enhanced Virtual Laboratory (DEVL) has focused on some key components of the infrastructure to manage this massive data archive and make accessible for CMIP6-based research in Australia. It builds on previous Australian e-infrastructure programs, the Climate & Weather Science Lab, and the National Earth Systems Data Collection and Data Services programs. It also supports NCI’s leading role in international collaborations, most notably the Earth Systems Grid Federation (ESGF) that provides the international federated capability for CMIP data. The value of this work over a long time has required the funding from various parties including other NCRIS funding programs ANDS, RDS, and NeCTAR NCRIS programs. This infrastructure directly supported other major investments from government-funded research from CAWCR (Collaboration for Australian Weather and Climate Research), NESP (National Environmental Science Program) and the ARC CoE for Climate System Science (ARCCSS) and ARC CoE for Climate Extremes (CLEX).

The Climate data at NCI is provided using the principles of FAIR: Findable, Accessible, Interoperable and Reusable. Providing a FAIR data service for such a large and complex data collection exposes significant data management challenges. NCI’s Data Quality Strategy (DQS) delivers data curation practices that permit FAIR standards and interdisciplinary data availability. This service permit streamlined access and analysis of CMIP6 data, enabling efficient state-of-the-art climate science research to be undertaken.

The unique challenges of the CMIP in both size and complexity has required new services to be developed and then made available as well managed operational services. The Climate DEVL has defined and developed the mechanisms for improved accessibility and usability of the data. One example is the need to find what data is available at NCI for use in analysis. This need has been addressed through the NCI’s Metadata Attribute Search (MAS). MAS provides consistent access to the information contained in the climate data collections by harvesting the metadata within the millions of self-describing files that constitute the CMIP data collection. The MAS also underpins a python-based API called CleF, developed by ARCCSS/CLEX, which provides command line search tools for accessing this data. CleF provides researchers with an easy interface to use the ESGF search API to discover what CMIP data has been published that match their specified requirements (experiment, variable, etc.) but is not yet available at NCI. The tool will be extended to enable users to then submit a data download request to add to the NCI CMIP6 replica service.

Another aspect of the Climate DEVL has been to focus a community approach to define the highest priority CMIP6 data needing to be replicated in Australia for local analysis, to permit timely development and publication of scientific research papers analysing the CMIP6 data as it becomes available. The DEVL also supports the evaluation of various model analysis tools, which provides an opportunity for the community to develop standardised workflows for data analysis contributing to the aforementioned research papers.

The Climate DEVL also provides a home for coordinating the ongoing development and availability of training materials necessary for a streamlined user experience. The extensive knowledge and interdisciplinary topics that span CMIP mean that effective training is needed, including face-to-face tutorials, online self-paced learning materials, and trainer training. The combined effort of NCI, CLEX, CSIRO and BoM permit such collaborative training efforts to benefit the entire Australian climate science community.


Biography:

Dr Kate Snow: I began at the National Computational Infrastructure (NCI) at the Australian National University in November 2017 as a Research Data Management Specialist. Prior to my position at NCI I completed a PhD in physical oceanography at the ANU and a two-year post-doc position researching Antarctic ice-sheet dynamics at Edinburgh University, Scotland. I am able to apply my research skills form the climate sciences at NCI to help inform data management practices to benefit climate research in Australia. My current role focuses on aiding in providing the support, tools and infrastructure to manage the Coupled Model Intercomparison Project phase 6 (CMIP6) to help provide Australian climate scientists with the capabilities to undertake state-of-the-art climate science.

The Curtin Institute for Computation – meeting the increasing demand for research software and computing skills across all faculties

Rebecca Lange1, CIC data scientist team2
1Curtin Institute for Computation, Curtin University, Perth, Australia, rebecca.lange@curtin.edu.au
2Curtin Institute for Computation, Curtin University, Perth, Australia, curtinic@curtin.edu.au

 

Abstract

In the era of ever growing data and interconnectivity, computation fundamentally underpins the majority of internationally competitive research across all fields and disciplines. As the demand for computational skills has grown, so too has the need for dedicated support for the research community. The Curtin Institute for Computation (CIC) was therefore established to meet this increasing demand at Curtin University.

The CIC is a truly multidisciplinary institute, inspiring and fostering collaboration across computer science, engineering, sciences, business, social sciences and the humanities. It has five themes; big data analytics, simulation, modelling and optimisation, visualisation, and education.

While the CIC is a virtual institute, it has a core team of data scientists who assist Curtin University researchers across all fields with their computational modelling, data analytics, and visualisation problems. Furthermore, the CIC data scientists are actively involved in creating opportunities for researchers to network and share ideas, and they develop and oversee computational training offered by the institute.

In this e-poster we provide an overview of the structure of the CIC and its achievements since the core data science team became operational in 2016. Furthermore, the poster will offer the opportunity to explore several case studies from across the institute, highlighting the need for, and success of, a central data scientist team supporting researchers from all fields.


Biography:

Rebecca Lange received her PhD in astronomy from the International Centre for Radio Astronomy Research at the University of Western Australia.

Before Rebecca moved to Australia she studied Astronomy and Physics at Nottingham Trent University where she also worked as a research assistant in scientific imaging for art conservation and archaeology. Her work there included the development and testing of instruments and software for imaging and spectroscopy as well as the organisation and supervision of field trips, which often required liaising with art curators and conservators.

Throughout her studies and research Rebecca has gained extensive programming as well as data analytics and visualisation experience in various programming languages.

Currently she is working as a data scientist for the Curtin Institute for Computation where she helps researchers by providing data analytics and computational support and training.

1236

Recent Comments

    About the conference

    eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

    Conference Managers

    Please contact the team at Conference Design with any questions regarding the conference.

    © 2018 - 2019 Conference Design Pty Ltd