Hivebench helps life scientists unlock the full potential of their research data

Mrs Elena Zudilova-Seinstra1, Mr Julien Therier1

1Elsevier, Amsterdam, The Netherlands

 

Title Hivebench helps life scientists unlock the full potential of their research data
Synopsis By integrating Hivebench ELN with an institutional repository, or the free data repository Mendeley Data, you can maximize the potential of your research data (see diagram below) and secure its long-term archiving. Hivebench supports compliance with data mandates and the storage of research process details, making results more transparent, reproducible and easier to store and share.

Indeed storing information in private files or paper notebooks poses challenges, not only for individual life scientists, but for their lab as a whole. An Electronic Lab Notebook stores research data in a well-structured format for ease of reuse, and simplifies the process of sharing and preserving information. It also structures workflows and protocols to improve experiment reproducibility.

Format of demonstration Live Demonstration
Presenter(s) Elena Zudilova-Seinstra, PhD

Sr. Product Manager Research Data

Elsevier RDM, Research Products

Target research community Whatever your role in the lab – researcher, PI, lab manager.
Statement of Research Impact Hivebench’s comprehensive, consistent and structured data capture provides a simple and safe way to manage and preserve protocols and research data.
Request to schedule alongside particular conference session Optional – List relevant conference sessions if any
Any special requirements Access to Internet connection.

 


Biography:

I’m a Senior Product Manager for Research Data at Elsevier. In my current role I focus on delivering tools for sharing and reuse of research data. Since 2014 I have being responsible for the Elsevier’s Research Elements Program focusing on innovative article formats for publishing data, software and other elements of the research cycle. Before joining Elsevier, I worked at the University of Amsterdam, SARA Computing and Networking Services and Corning Inc.

Breathing new life into old collections – Using citizen Science to revitalising Geoscience Australia Microscope Slide Based collections

Mr John Pring1, Dr Richard Blewett1, Mr Billie Poignand1, Mr Oliver Raymond1, Dr David Champion1, Ms Irina Bastrakova1, Mr Neal Evans1, Mr Peter Butler1, Dr Alastair Stewart1

1Geoscience Australia, Canberra, Australia,john.pring@ga.gov.aurichard.blewett@ga.gov.au, billie.poignand@ga.gov.au, oliver.raymond@ga.gov.au,  david.champion@ga.gov.au, irina.bastrakova@ga.gov.au, neal.evans@ga.gov.au, peter.butler@ga.gov.aualastair.stewart@ga.gov.au

 

DESCRIPTION

Since soon after the federation of Australia in 1901 Geoscience Australia, and its predecessors organisations, have gathered a significant collection of microscope slide based items (including: thin sections of rock and micro fossils) from across Australia, Antarctica, Papua New Guinea, the Asia Pacific region and beyond. The samples from which the microscope slides were produced have been gathered via extensive geological mapping programs, work conducted for major Commonwealth building initiatives such as the Snowy Mountain Scheme and science expeditions. The cost of recreating this collection, if at all possible, would be measured in the $100Ms (AUS) even assuming that it was still possible to source the relevant samples.

While access to these microscope slides is open to industry, educational institutions and the public it has not been easy to locate specific slides due to the management system. The management of this collection was based largely on an aged card catalogue and ledger system. The fragmented nature of the management system with the increasing potential for the deterioration of physical media and the loss of access to even some of the original contributors meant that rescue work was (and still is) needed urgently.

Achieving progress on making the microscope slides discoverable and accessible in the current fiscally constrained environment dictated a departure from what might be considered a traditional approach to the project and saw the extensive use of a citizen science approach through the use of DigiVol and reference to a small number of onsite volunteers.

Through the use of a citizen science approach the proof of concept project has seen the transcription of some 35,000 sample metadata and data records (2.5 times our current electronic holdings) from a variety of hardcopy sources by a diverse group of volunteers. The availability of this data has allowed for the electronic discovery of both the microscope slides and their parent samples, and will hopefully lead to a greater utilisation of this valuable resource and enable new geoscientific insights from old resources.

One of the other benefits of the use of Digivol has been increasing Geoscience Australia’s positive exposure to a totally new section of the general public.  It has highlighted the role of the agency to an audience that had previously had little or no involvement with the geosciences.

REFERENCES

  1. DigiVol citizen science transcription site available from https://volunteer.ala.org.au/ accessed 1 August 2017
  2. Geoscience Australia eCat Record http://pid.geoscience.gov.au/dataset/112965, created 28 Aug 2017

Biography:

John Pring holds a Masters of Management Studies (Project Management/Technology and Equipment) from the University of New South Wales and an Electrical Engineering Degree from the University of Southern Queensland.

He has been Senior Project Manager within the Environmental Geoscience Division of Geoscience Australia for some 10 years and has run a number of projects associated with the management of the agencies data and physical collections over that time.

He has held similar roles within other government agencies prior to joining Geoscience Australia.

Breathing new life into old collections – Using citizen science to revitalise Geoscience Australia’s Microscope Slide Based collections

Mr John Pring1, Dr Richard Blewett1, Mr Billie Poignand1, Mr Oliver Raymond1, Dr David Champion1, Ms Irina Bastrakova1, Mr Neal Evans1, Mr Peter Butler1, Dr Alastair Stewart1

1Geoscience Australia, Canberra, Australia, john.pring@ga.gov.au, richard.blewett@ga.gov.au, billie.poignand@ga.gov.au, oliver.raymond@ga.gov.audavid.champion@ga.gov.au, irina.bastrakova@ga.gov.auneal.evans@ga.gov.aupeter.butler@ga.gov.au, alastair.stewart@ga.gov.au

DESCRIPTION

Since soon after the federation of Australia in 1901 Geoscience Australia, and its predecessors organisations, have gathered a significant collection of microscope slide based items (including: thin sections of rock and micro fossils) from across Australia, Antarctica, Papua New Guinea, the Asia Pacific region and beyond. The samples from which the microscope slides were produced have been gathered via extensive geological mapping programs, work conducted for major Commonwealth building initiatives such as the Snowy Mountain Scheme and science expeditions. The cost of recreating this collection, if at all possible, would be measured in the $100Ms (AUS) even assuming that it was still possible to source the relevant samples.

While access to these microscope slides is open to industry, educational institutions and the public it has not been easy to locate specific slides due to the management system. The management of this collection was based largely on an aged card catalogue and ledger system. The fragmented nature of the management system with the increasing potential for the deterioration of physical media and the loss of access to even some of the original contributors meant that rescue work was (and still is) needed urgently.

Achieving progress on making the microscope slides discoverable and accessible in the current fiscally constrained environment dictated a departure from what might be considered a traditional approach to the project and saw the extensive use of a citizen science approach through the use of DigiVol and reference to a small number of onsite volunteers.

Through the use of a citizen science approach the proof of concept project has seen the transcription of some 35,000 sample metadata and data records (2.5 times our current electronic holdings) from a variety of hardcopy sources by a diverse group of volunteers. The availability of this data has allowed for the electronic discovery of both the microscope slides and their parent samples, and will hopefully lead to a greater utilisation of this valuable resource and enable new geoscientific insights from old resources.

One of the other benefits of the use of Digivol has been increasing Geoscience Australia’s positive exposure to a totally new section of the general public.  It has highlighted the role of the agency to an audience that had previously had little or no involvement with the geosciences.

REFERENCES

  1. DigiVol citizen science transcription site available from https://volunteer.ala.org.au/ accessed 1 August 2017
  2. Geoscience Australia eCat Record http://pid.geoscience.gov.au/dataset/112965, created 28 Aug 2017

Biography:

John Pring holds a Masters of Management Studies (Project Management/Technology and Equipment) from the University of New South Wales and an Electrical Engineering Degree from the University of Southern Queensland.

He has been Senior Project Manager within the Environmental Geoscience Division of Geoscience Australia for some 10 years and has run a number of projects associated with the management of the agencies data and physical collections over that time.

He has held similar roles within other government agencies prior to joining Geoscience Australia.

Field Acquired Information Management Systems Project: FAIMS Mobile, a customisable platform for data collection during field research

A/Prof. Shawn Ross1, Dr Adela Sobotkova1, Dr Brian Ballsun-Stanton

1Macquarie University, Sydney, Australia

Title Field Acquired Information Management Systems Project: FAIMS Mobile, a customisable platform for data collection during field research
Synopsis FAIMS Mobile is open-source, customisable software designed specifically to support field research across many domains. It allows offline collection of structured, text, multimedia, and geospatial data on multiple Android devices, and is built around an append-only datastore that provides complete version histories. It includes customisable export to existing databases or in standard formats. Finally, it is designed for rapid prototyping using and easy redeployability to reduce the costs of implementation. Developed for ‘small data’ disciplines, FAIMS Mobile is designed to collect heterogenous data of various types (structured, free text, geospatial, multimedia) produced by arbitrary methodologies. Customised by an XML-based domain specific language, it supports project-specific data models, user interfaces, and workflows, while also addressing problems shared across field-based projects, such as provision of a mobile GIS, data validation, delivery of contextual help, and automated synchronisation across multiple devices in a network-degraded environment. Finally, it promotes synthetic research and improves transparency and reproducibility through the production of comprehensive datasets that can be mapped to vocabularies or ontologies as they are created.
Format of demonstration Slides / screenshots
Presenter(s) A/Prof Shawn A Ross, Director of Data Science and eResearch, Macquarie University and Co-Director, FAIMS Project.

Dr Adela Sobotkova, Research Associate, Department of Ancient History, Macquarie University and Co-Director, FAIMS Project.

Dr Brian Ballsun-Stanton, Research Associate, Department of Ancient History, Macquarie University and Technical Director, FAIMS Project.

Target research community Researchers in fieldwork disciplines where people (rather than automated sensors) collect data, e.g., archaeology, biology, ecology, geosciences, linguistics, oral history, etc.
Statement of Research Impact FAIMS Mobile has changed users’ daily practice. Case studies indicate that users benefit from the increased efficiency of fieldwork (the time saved by avoiding digitisation more than offsets the time required to implement the system). Born-digital data avoided problems with delayed digitisation, which often occurred long after field recording when the context of records had been forgotten. Researchers reported more complete, consistent, and granular data, and that information could be exchanged more quickly between field researchers and lab specialists, facilitating the evaluation of patterns for meaning. They also observed that the process of moving from paper to digital required comprehensive reviews of field practice, during which knowledge implicit in existing systems to become explicit and data was modelled carefully for the first time.
Request to schedule alongside particular conference session  
Any special requirements Nothing special.

Biography:

Shawn A Ross (Ph.D. University of Washington, 2001) is Associate Professor of History and Archaeology and the Director of Data Science and eResearch at Macquarie University.  A/Prof Rossʼs research interests include the history and archaeology of pre-Classical Greece, oral tradition as history (especially Homer and Hesiod), the archaeology of the Balkans (especially Thrace), Greece in its wider Mediterranean and Balkan context, and the application of information technology to research. Since 2009, the focus of A/Prof Rossʼs work has been fundamental archaeological research in Bulgaria. He is a Research Associate at the American Research Center in Sofia, Bulgaria, and supervises the Tundzha Regional Archaeological Project (http://www.tundzha.org), a large-scale archaeological survey and palaeoenvironmental study in central and southeast Bulgaria. Since 2012 A/Prof Ross has also directed the Field Acquired Information Management Systems (FAIMS) project (http://www.faims.edu.au/) aimed at developing data capture, management, and archiving resources for researchers in fieldwork-based disciplines. Previously, A/Prof Ross worked at the University of New South Wales (Sydney, Australia) and William Paterson University (Wayne, New Jersey).

Hivebench helps life scientists unlock the full potential of their research data

Mrs Elena Zudilova-Seinstra1, Mr Julien Therier1

1Elsevier, Amsterdam, The Netherlands

 

Title Hivebench helps life scientists unlock the full potential of their research data
Synopsis By integrating Hivebench ELN with an institutional repository, or the free data repository Mendeley Data, you can maximize the potential of your research data (see diagram below) and secure its long-term archiving. Hivebench supports compliance with data mandates and the storage of research process details, making results more transparent, reproducible and easier to store and share.

Indeed storing information in private files or paper notebooks poses challenges, not only for individual life scientists, but for their lab as a whole. An Electronic Lab Notebook stores research data in a well-structured format for ease of reuse, and simplifies the process of sharing and preserving information. It also structures workflows and protocols to improve experiment reproducibility.

Format of demonstration Live Demonstration
Presenter(s) Elena Zudilova-Seinstra, PhD

Sr. Product Manager Research Data

Elsevier RDM, Research Products

Target research community Whatever your role in the lab – researcher, PI, lab manager.
Statement of Research Impact Hivebench’s comprehensive, consistent and structured data capture provides a simple and safe way to manage and preserve protocols and research data.
Request to schedule alongside particular conference session Optional – List relevant conference sessions if any
Any special requirements Access to Internet connection.

 


Biography:

I’m a Senior Product Manager for Research Data at Elsevier. In my current role I focus on delivering tools for sharing and reuse of research data. Since 2014 I have being responsible for the Elsevier’s Research Elements Program focusing on innovative article formats for publishing data, software and other elements of the research cycle. Before joining Elsevier, I worked at the University of Amsterdam, SARA Computing and Networking Services and Corning Inc.

Lightflow – A lightweight, distributed workflow system

Dr Andreas Moll1, Dr Stephen Mudie2, Mr Robbie Clarken3

1Australian Nuclear Science and Technology Organisation, Melbourne, Australia, andreas.moll@synchrotron.org.au

2Australian Nuclear Science and Technology Organisation, Melbourne, Australia, stephen.mudie@synchrotron.org.au

3Australian Nuclear Science and Technology Organisation, Melbourne, Australia, robbie.clarken@synchrotron.org.au

 

Introduction

The Australian Synchrotron, located in Clayton, Melbourne, is one of Australia’s most important pieces of research infrastructure. Light emitted from accelerated electrons, travelling at nearly the speed of light, is utilised by 10 beamlines in order to conduct a very diverse range of research. After more than 10 years of operation, the beamlines at the Australian Synchrotron are well established and the demand for automation of research tasks is growing. Such tasks routinely involve the processing of TB-scale data, online (realtime) analysis of the recorded data to guide experiments, and fully automated data management workflows. In order to meet these demands we have developed Lightflow [1], a generic, distributed workflow system. It has been released as open source software on GitHub and can be found at: https://github.com/AustralianSynchrotron/Lightflow

Architecture

Lightflow models a workflow as a set of individual tasks arranged as a directed acyclic graph (DAG). This specification encodes the direction that data flows as well as dependencies between tasks. Each workflow consists of one or more DAGs. While the arrangement of tasks within a DAG cannot be changed at runtime, other DAGs can be triggered from within a task, therefore enabling a workflow to be adapted to varying inputs or changing conditions during runtime.

Lightflow employs a worker-based queuing system, in which workers consume individual tasks. This allows the processing of workflows to be distributed. Such a scheme has multiple benefits: It is easy to scale horizontally; tasks that can be executed in parallel are executed on available workers at the same time; tasks that require specialised hardware or software environments can be routed to dedicated workers; and it simplifies the integration into existing container based cloud environments.

In order to avoid single points of failure, such as a central daemon often found in other workflow tools, the queuing system is also used to manage and monitor workflows and DAGs. When a new workflow is started, it is placed in a special queue and is eventually consumed by a worker. A workflow is executed by sending its DAGs to their respective queues. Each DAG will then start and monitor the execution of its tasks. The diagram in Figure 1 depicts the worker-based architecture of Lightflow.

Figure 1: Worker based architecture of Lightflow

Implementation

Lightflow is written in Python 3 and supports Python 3.5 and higher. It uses the Celery [2] library for queuing tasks and the NetworkX [3] module for managing the directed acyclic graphs. As redis [4] is a common database found at many beamlines at the Australian Synchrotron, it is the default backend for Celery in Lightflow. However, any other Celery backend can be used as well. In addition to redis, Lightflow uses MongoDB [5] in order to store data that is persistent during a workflow run. Examples include the aggregation of values, calculation of running averages, or the storage of flags.

Tasks can receive data from upstream tasks and send data to downstream tasks. Any data that can be serialised can be shared between tasks. Typical examples for data flowing from task to task are file paths, pandas [6] DataFrames or numpy [7] arrays. The exchange of data across a distributed system is accomplished by using cloudpickle [8] in order to serialise and deserialise the data. Lightflow provides a fully featured command line interface for starting, stopping and monitoring workflows and workers. The command line interface is based on the click [9] Python module. An API is also available, in order to integrate Lightflow with existing tools and software.

In order to keep Lightflow lightweight, the core library focuses on the essential functionality of a distributed workflow system and only implements two tasks, a generic Python task and a bash task for calling arbitrary bash commands. Specialised tasks and functionality is implemented in extensions. Currently there are three extensions to Lightflow available: The filesystem extension offers specialised tasks for watching directories for file changes and tasks covering basic file operations; the EPICS [10] extension offers tasks that hook into EPICS, a control system used at the Australian Synchrotron for operating the hardware devices of the accelerator and the beamlines; and the REST extension provides a RESTful interface for starting, stopping and monitoring workflows via HTTP calls.

Lightflow at the MX Beamline

The two Crystallography beamlines (MX1, MX2) at the Australian Synchrotron have employed a custom made data management workflow for a number of years. Both the raw and reconstructed data of an experiment is compressed into squashfs files, verified and stored in the central storage system of the Australian Synchrotron. Recently this workflow has been upgraded to use Lightflow in order to take advantage of a distributed system to compress multiple experiments at the same time. The updated setup consists of a management virtual machine that hosts the workflow and DAG queues as well as acting as a REST endpoint for starting the squashfs workflow. Three physical servers act as squashfs nodes. The workflow is triggered by a HTTP REST call from the experiment change management system at the Crystallography beamlines.

Lightflow at the SAXS/WAXS Beamline

Several data processing pipelines are implemented using Lightflow for the SAXS/WAXS beamline. An example is the phaseID pipeline. This pipeline identifies diffraction peak positions within SAXS profiles and infers the most likely Space Group. This pipeline enables researchers to rapidly determine phase diagrams for self-assembled lyotropic liquid crystal systems. These systems are important for drug delivery and controlled release.

Summary and Outlook

Lightflow is a lightweight and distributed workflow system written in Python and has been released as open source software on GitHub. It is currently used at several beamlines at the Australian Synchrotron for managing data or implementing data processing pipelines. The next steps are to extend the use of Lightflow at the Australian Synchrotron to the experiment change management at beamlines, complex data management workflows and auto processing workflows at the Crystallography beamlines.

References

  1. Lightflow. Available from https://github.com/AustralianSynchrotron/Lightflow, accessed 22 Jun 2017.
  2. Celery. Available from http://www.celeryproject.org, accessed 25 June 2017.
  3. NetworkX. Available from https://networkx.github.io, accessed 28 June 2017.
  4. redis. Available from https://redis.io, accessed 15 June 2017
  5. MongoDB. Available from https://www.mongodb.com, accessed 25 June 2017
  6. pandas. Available from http://pandas.pydata.org, accessed 17 June 2017
  7. numpy. Available from http://www.numpy.org, accessed 25 June 2017
  8. cloudpickle. Available from https://github.com/cloudpipe/cloudpickle, accessed 23 June 2017
  9. click. Available from http://click.pocoo.org, accessed 25 June 2017
  10. EPICS. Available from http://www.aps.anl.gov/epics, accessed 25 June 2017

 


Biography

Andreas is the leader of the software engineering team at the Australian Synchrotron in Melbourne. His and his team’s work comprises the development of experiment control systems, scientific software, data pipelines and data management tools. Before being allowed to spend his days writing Python code and learning about microservices, he had to go through a 6 year Fortran and C++ bootcamp in his PhD.

New functionality in the CloudStor platform — an update and roadmap

Mr Guido Aben1

1AARNet, Kensington, Australia

 

CLOUDSTOR: from DATA platform to research hub

Research informed roadmap

AARNet has laid out a roadmap for CloudStor based on feedback from researchers from diverse domains, with varying data storage, movement and technology requirements, and observation of users and usage patterns.  CloudStor stores MB-TB sized data, data all at once, data arriving in batches, data deposited or made accessible directly by hand or machine-to-machine interface, data as an input to or output of science, humanities, arts and social science.  In order to make life easier and more productive for a range of researchers using CloudStor, AARNet is evaluating and will roll out a range of technologies and platforms, as steps toward transforming the infrastructure from a sync app to a research hub.

In this presentation the following technologies and services will be discussed: Jupyter Notebook[1], Kaltura[2], and direct S3-interface[3] bulk storage access as candidates for integration that aid in the evolution of CloudStor as a data storage platform into a research hub and serve diverse research infrastructure requirements.

from synch app to research hub

CloudStor was initially conceived as a data movement and synchronisation platform to operate at AARNet line speed (~10Gbps) and synchronise across the Australian continent. Meeting these basic data infrastructure requirements at the outset in the design was a daunting prospect and an important piece of national research infrastructure foundation for AARNet to lay down in service of research and education.  What has emerged is a consolidated and stable platform that supports day-to-day data movement and working storage operations of a sizable number of researchers (~37,000 as at June 2017).

Two new patterns of user/usage have emerged that reflect the shift from synch app to research hub:

  1. shared storage space (groups of researchers, administrators, and infrastructure specialists) enabling collaboration and multi-party data handling
  • direct data transfer by machines (as system users) as an efficient step in the research and data curation lifecycles

Researchers and data infrastructure support specialists are actively using the platform to define groups of collaborators and share specific subsets of that data between specific users. Examples of these groups are: PARADISEC, the Australian Data Archive, the Australian Antarctic Division (for ice acoustic data) and several Centres of Excellence who use groups to bridge between academic and industry participants.  As evinced by data management patterns (and confirmed through interviews with users) we have also discerned that CloudStor users (as data sources) are humans – and – machines.  We are finding that instruments have been set up to upload data directly into CloudStor (albeit logged in as a user – as mandated though AAF policy).  Those observations – the uptake of group functionality, as well as the group membership including humans and machines, reveal that CloudStor is no longer used exclusively as a “personal cloud folder” and platform, but is becoming a research data hub.

augmenting the research hub ecosystem

Triggered by this research demand as described above, AARNet is investigating several technologies and platforms, as candidate systems to integrate or interface with

  • International evidence of research value (access to computational notebooks)
  • Engagement with researchers across all domains, and in particular the humanities, arts and social sciences (access to multimedia data processing), and scientists using instruments generating big data (access to bulk storage).

Integration of Jupyter with CloudStor:  Peer e-infrastructure service providers, notably CERN, have received high demand for a computational notebooks service that is tightly coupled with cloud storage; in CERN, this combined service is called SWAN[4]. The integration of these pieces of research infrastructure enable researchers to execute relatively simple computation and data manipulation on the active data in cloud storage (and there is no need for researchers to further download the data, undertake compute, and re-upload the data into storage). Scripts used for computation can themselves be kept, versioned and run direct from CloudStor; the resultant system turns CloudStor into a cloud data manipulation engine. AARNet has a trial version of a “SWAN-like” service on the CloudStor roadmap.

Multimedia curation/viewing via plugin (through Kaltura): Large multimedia holdings are being stored in CloudStor, notably by groups of HASS researchers. In interviews with these research colleagues, we have discovered that direct previewing of these files from their cloud storage platform would be beneficial.  In addition to this basic file viewing requirement, we understand that: annotation, geo-fencing, rights management, would further enhance the value of CloudStor as a multimedia data processing platform.   As a result of this engagement with researchers AARNet the implementation of a Kaltura plugin node has been added to the CloudStor roadmap and we will be working with selected HASS researchers to fine-tune the offering.

Direct one-way (upload-only) transfer of oversize datasets:

For data generated in a day-to-day “trickle” pattern, the synch&share paradigm enabled by CloudStor client apps works well.  A different paradigm is being tested out within CloudStor, for data generated by large science instruments and transferred into storage for two reasons.  (1) The instrument data does not need to be kept in synch; the data just needs to be uploaded (and is never downloaded back to the instrument). (2) For raw performance, better interfaces exist than the WebDAV protocol used by the synch clients in CloudStor. AARNet is currently trialing direct S3 bulk-storage access to the CloudStor data vaults with a number of selected research groups.

[1] http://jupyter.org/

[2] https://corp.kaltura.com/deployment-options/kaltura-community-edition

[3] https://en.wikipedia.org/wiki/Amazon_S3#S3_API_and_competing_services

[4] https://cds.cern.ch/record/2158559

 


Biography

Guido Aben is AARNet’s director of eResearch. He holds an MSc in physics from Utrecht University.

In his current role at AARNet, Guido is responsible for building services to researchers’ demand, and generating demand for said services, with CloudStor and CloudStor+ perhaps the most widely known of those.

“AARNet X” – a 100Gbps Pathfinder Network for Unique and Evolving Research Needs

Mr Brett Rosolen1, Mr David Wilde1, Mr Chris Myers1, Mr Warrick Mitchell1

1Aarnet, North Ryde, Australia, brett.rosolen@aarnet.edu.au

 

Data intensive research needs high capacity frictionless networks that can reliably and consistently deliver very large research data transfers without detrimental impacts on other uses of the network.

Much has already been done to date to enhance Australia’s National Research and Education Network, AARNet, to make this frictionless networking a reality. The national backbone now operates at 100Gbps, multiple 100Gbps services are in place across the Pacific, and the Science DMZ architecture has been implemented at 10Gbps rates at a small number of campus networks. The latter has created the potential for very large data flows to consume all the available bandwidth for a campus, so separating this bandwidth from the campus capacity allows data transfers at line speed, ensuring business continuity and unfettered research use.

However, the game changes completely when individual sites are connected at 100Gbps. Testing to date has demonstrated that single flows can consume a very significant portion of this capacity, creating the opportunity for extremely large data flows (dubbed “elephant flows”) between research infrastructure services, and greatly increasing the likelihood that research flows may impact business continuity.

The possible solution for this dilemma is to provide network capacity specifically for data intensive science, by enhancing the network to allow business as usual traffic to traverse paths that are separated from research flows.

AARNet’s new pathfinder network infrastructure, AARNet-X (AX) is designed to address this challenge, and to support extreme, unique and evolving customer requirements. It will also enable AARNet to develop expertise with new platforms and technologies.

This talk will identify the science drivers and subsequent design approach of the AARNet X network and how our community can use it to freely move data for better science outcomes.

 


Biography

Brett Rosolen is the Data Program Manager, eResearch at AARNet

Digitisation Workflows for Research

Ms Ingrid Mason1, Mrs Sarah Nisbet2

1AARNet, Yarralumla, Canberra, Australia, ingrid.mason@aarnet.edu.au

2eResearch South Australia, Adelaide, Australia, sarah.nisbet@ersa.edu.au

 

DIGITISATION FOR RESEARCH

Digitisation, the scanning of physical material, is employed in research institutions (for research) using different technologies and techniques (e.g. magnetic resonance imaging or DNA barcoding) to produce research objects in a range of digital outputs (e.g. XML or MPEG4) and dimensions (e.g. 2D-4D).  Direct and indirect working relationships operate between the researcher (and the research activity) and the digitisation service and the material holder.   There are institutional digitisation services provided via university faculty and library and services provided for the university by third parties.  The material to be digitised may be owned and held by the university, or on loan, or owned or in the custody of a third party.  Depending in the researcher’s requirements for digitised material the workflows for digitisation can range from self-help and booking internal access to scanning technologies or to complex and multi-party arrangements.

RDS CULTURES AND COMMUNITY PROJECT

This presentation will summarise a case study – the digitisation workflows for research arising from NCRIS funded Research Data Services project: Cultures and Community led by eResearch South Australia in partnership with Griffith University (the Prosecution Project lead by Prof Mark Finnane), Tasmanian Archives and Heritage Office, Queensland State Archives, and VicNode.  The case study will cover:

  1. Digitisation workflows for research, where the material to be digitised is in the custody of multiple third parties (cultural institutions), is digitised “in-house” within those cultural institutions, and the digital output (the research object) is conveyed to the research institution.
  • The practical challenges that arise to arrange and convey digitised material from multiple cultural institutions to the research institutions, using portable media (external hard-drive), web download (file transfer protocol), and CloudStor (file sender service).
  • The lessons learned on the types of tools and services (bridging research infrastructure) that can be leveraged, when data providers and digitisation services are provided by third parties working outside of the researcher sector.
  • The benefits of establishing a national framework that documents digitisation workflows and guidance information for research, ranging from basic internal institutional processes to multi-party interests and inter-institutional collaboration, in particular for research in the humanities, arts and social sciences (HASS).

CULTURAL HERITAGE DIGITISATION

Digitisation is a core feature of Australia’s cultural institutions collection access work programmes and has been for over two decades. Major drivers for digitisation in cultural institutions are: preservation, research, education, publication, and exhibition.  Prioritisation of work is a careful balance of institutional and community interests.  By example, there is a priority for digitisation where there is impact:

  • High public demand for access
  • Risk to the viability material (through physical decline)
  • Culturally significant material
  • Research and educational value

Impact considerations include:

  • Strategic alignment
  • Usage level
  • Collection integrity
  • Scholarship support
  • Community service

Institutional collection access services provided to community can and do coincidentally also serve scholarly access needs to digital cultural collections.  Where scholarly needs are discrete however, i.e. focused on particular research and education agenda and programme requirements, access to collection (and digitisation requests) needs to be negotiated and planned in partnership with collection custodians in cultural institutions.

Digitisation of Australia’s cultural collection material serves the interests of HASS researchers, through collaboration and partnering with the cultural sector.  Examples of content aggregation (of digitised cultural heritage and research collection material as an archive) with discovery services that have arisen to support HASS research directly or indirectly are:

  • AustLii – maintained by UTS and UNSW Faculties of Law [1]
  • AustLit – maintained by University of Queensland with collaboration partners [2]
  • Historical and Colonial Census Data Archive – maintained by the Australian Data Archive, Australian National University [3]
  • Australian Policy Online – maintained by Swinburne University [4]
  • Founders and Survivors – maintained by University of Melbourne with collaboration partners [5]
  • PARADISEC – maintained by University of Sydney, University of Melbourne and the Australian National University [6]
  • Prosecution Project – maintained by Griffith University [7]
  • Trove – maintained by the National Library of Australia [8]

REFERENCES

  1. About AustLii. Available from: http://www.austlii.edu.au/austlii/, accessed 20 June 2017
  2. About AustLit. Available from: https://www.austlit.edu.au/austlit/page/5961886, accessed 20 June 2017
  3. About Historical and Colonial Census Data Archive. Available from: https://www.ada.edu.au/historical/about, accessed 20 June 2017
  4. About APO. Available from: http://apo.org.au/about, accessed 20 June 2017
  5. About the Project. Available from: http://foundersandsurvivors.org/project, accessed 20 June 2017
  6. About Us. Available from: http://www.paradisec.org.au/about-us/, accessed 20 June 2017
  7. About. Available from: https://prosecutionproject.griffith.edu.au/about, accessed 20 June 2017
  8. About Trove. Available from: http://trove.nla.gov.au/general/about, accessed 20 June 2017

Biographies

Ingrid Mason, Deployment Strategist with AARNet, provides support for engagement and the uptake of the national research and education network (NREN) and services with AARNet members across the research, cultural and collections sectors. Ingrid has worked on several NCRIS programs: Australian National Data Service, National eResearch Collaborative Tools and Resources, and Research Data Services. http://orcid.org/0000-0002-0658-6095

Sarah Nisbet is eRSA’s Marketing and Communications Manager. Sarah began her career delivering communications solutions in the health care sector where she mastered the art of working across institutions, departments and organisational silos.  She specialises in delivering creative and innovative marketing and communication solutions and has managed local and national projects for eRSA, NeCTAR, NeAT, AeRO and the State Government of South Australia.

Koalas, Floods and Tulips: Environmental monitoring with long range, low power sensor networks

Mr Nick Cross1, Mr Peter Elford1, Catherine Caruana-McManus2

1Aarnet, North Ryde, Australia, peter.elford@aarnet.net.au

2Meshed, Castlecrag, Australia, Catherine@meshed.com.au

INTORDUCTION

Emerging Internet of Things (IoT) technologies are for the first time enabling relatively low-cost deployment of environmental sensor networks that can capture urban liveability data in real-time enabling a city liveability index. This has the potential to significantly inform development agendas and improve the world-class sustainable design approaches and progressive public policies.  Furthermore, open source technology across the technology stack for IoT presents a significant opportunity to deploy massive real-time data sets at relatively low cost. Critical to the sensor network is the ability for this data to seamlessly integrate to e-research infrastructure as well as support advanced visualization via IoT platforms and analytics.  Machine learning and cognitive systems are also required in order to process the “streams” of data that is generated by ubiquitous IoT networks.

This presentation will cover the following topics:

  1. Setting the scene for IoT and e-research (drivers, differentiators in light of existing methods of data connectivity, challenges, data categorisation)
  2. New IoT connectivity technologies – featuring a technical overview and use cases of LoRaWAN
  3. Democratising the Internet of Things – the role that public access IoT networks as the enabler for e-Research for large real-time sensor networks supporting the key sectors of: smart cities, built environment, energy, water, environment
  • Standards, interoperability and open/data sharing models as enablers for IoT
  1. The role of IoT research data network infrastructure and storage
  2. Case Studies
    1. Tulip- Air Quality Monitoring IoT Network – Sydney
    2. Early Flood Warning Systems featuring the Flood Network – Oxford
    3. Koala Protection – Gold Coast Hinterland

Key themes, – internet of things, low power long range wireless networks (LPWAN), LoRaWAN, industrial internet of things, automation, advanced visualization, air quality monitoring, urban heat island, noise pollution,

CASE STUDY: TULIP

Tulip is an urban-scale sensing system that will enable the researchers, city leaders, environmentalists, urban planners, residents, governments and the built environment industry to monitor and examine Sydney’s environment, infrastructure and activity at street scale, including detecting trends and changes over time.  The goal is Tulip is to measure in fine detail “health” of the city in sufficient detail to provide data to help engineers, scientists, policymakers and residents work together to make Sydney including its suburbs and any other Australia cities healthier, more liveable and more efficient. Tulip is an open data initiative that is using a network of sensors that are measuring a range of environmental indicators including air quality (NOx SOx, CO particulate pollution), heat, noise and the numbers of people moving through spaces.  Through combining these datasets TULIP can provide a liveability index for the urban landscape and address the impact of climate change, increased density of our cities and the energy and water security issues.  Through the generation and creative use of data relating to the health of the city, Tulip is the key to catalysing action and providing planning and policy decisions with hard data for sustainably cities, social cohesion and economic prosperity.

CASE STUDY: KOALA PROTECTION

As urban expansion continues, koalas face ever-increasing threats to their survival. Since European settlement of Australia, more than 50 per cent of koala habitat has been destroyed and much of what remains has been degraded and fragmented. Degradation of habitat can occur through factors such as selective logging of preferred koala food trees, weed invasion and inappropriate fire regimes. Fragmentation of habitat can lead to isolation of individuals and populations. However, the most immediate threat to koalas are domestic dogs and feral animals.

Near real-time monitoring of 38 foot hold traps designed to trap feral animals is being trialed by rangers in remote Gold Coast Hinterland.  The project is being delivered by Smarter Technology Solutions, using the Meshed LoRaWAN network that has been deployed across the entire municipality of the City of Gold Coast.  Many of the traps are in areas where there is no 3G coverage hence using the radio spectrum to transmit data is enabling more timely information being transmitted and greatly assisting in conservation and research efforts.

CASE STUDY: FLOOD MONITORING

Having timely information about river level rises can mean the difference between asset protection and safety on one hand and costly flood inundation on the other.  A network of LoRaWAN river level sensors can report rapid rises in river levels with multiple data points on different tributaries that creates an overall picture in granular data sets, and visual maps.  Furthermore, the IoT enabled flood mapping application can deliver email and text message alerts to pre-determined recipients (Eg: SES crews) for rapid response, and early warning.  In this way, residents of flood prone areas will get more time to protect their assets, move vulnerable people out of harms’ way, and maybe even save lives.

Once a flood is in progress, river level data will continue to be automatically reported without the need to send people into flood effected areas to manually collect river level data.  The data that is now being automatically gathered every day can be used in a data warehouse to provide fodder for predictive analytics engines and combining with other sources of information such as the Bureau of Meteorology.

CASE STUDY: OXFORD FLOOD NETWORK

The Oxford Flood Network is a citizen-based initiative for water-level monitoring sensors – a “guerilla network” in the spirit of the crowdsourced Japan Radiation Map created by the public around Fukushima in response to a lack of official information. In the floodplain of Oxford members of the local community are installing their own water-level monitoring sensors and sharing local knowledge about rivers, streams and groundwater to build a better, hyper-local picture of the situation on the ground.

 


BIOGRAPHY

Catherine Caruana-McManus is a global expert in smart cities and digital transformation and has a career spanning 25 years across government, telecommunications, IT and advisory.

Catherine is a Director of Meshed, a Sydney based IoT integration company and the founder of Giant Ideas for Smart Cities, a global community for smart cities, the new energy economy and the internet of things.  Recently, Catherine has been recognised by Prime Minister Malcolm Turnbull’s Knowledge Nation initiative as one of Australia’s leading thinkers and innovators in big data and smart cities.

Catherine is on the Advisory Board for UoW SMART Faculty and is on the Executive Council and the Chairperson of the Smart Cities and Industry Engagement Work Stream of the IoT Alliance Australia.

Catherine’s prior roles include being the Director of IoT for Energy and Resources for KPMG, Director of IBM’s Smarter Cities and has held other executive positions for MC2 Consulting, PMP Limited and Telstra.

As a serial disrupter, Catherine has been intimately involved in launching many successful internet businesses such as Australia’s first real estate portal,  where.com and whitepages.com.au. Catherine holds qualifications in urban planning, economics, management and finance.

About the conference

eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.

© 2016 - 2017 Conference Design Pty Ltd