eStoRED – A Distributed Platform for Research Data Evaluation, Enrichment and Stories Drafting

Mr Guillaume Prevost1, Professor Heinrich Schmidt2

1 RMIT University, Melbourne, Australia, guillaume.prevost@rmit.edu.au 
2 RMIT University, Melbourne, Australia, heinrich.schmidt@rmit.edu.au

 

In recent times several eResearch applications appeared assisting discipline experts in Australian universities in telling stories about research data. We describe eStoRED, a platform that not only helps gathering data and quickly pulling it together into a meaningful story draft, but also assists the researcher to enrich them with calculations and visualizations. Keeping a focus on research users, the platform adds value to data using calculators and connectors, fusing heterogeneous data together. It fits in the research method and with existing eResearch tools supporting this process. We first have a look at some of the use cases eStoRED has been used for, and describe some uncommon aspects and features that make eStoRED valuable as an eResearch platform.

 

RESEARCH FOCUSED

Since the genesis of eStoRED, it is a key to add value to data through interpretation by experts in their relevant fields. The platform originated as a tool that focused on providing data related to Australian and South Pacific seaports to support early-­‐stages climate risk assessment and climate change adaptation training/planning [3]. Now more versatile and more mature, eStoRED remains a tool in the hands of the expert.

We have a brief look at cases where the platform has been used to enrich and facilitate re-­‐use of research data in different contexts. RMIT University gathers signature data collections used or produced by some of the University’s researchers, and needs enrichment of these very diverse collections so that users browsing collections will have an almost immediate understanding of their content and how they could be re-­‐used for other research purposes.

The first example is a research data collection of tweets during the UK riots of 2011. It includes data on the course of the events and on the role of the software facilitating and shaping the discussion [2]. The data are a snapshot of Twitter activity returned by the Twitter streaming API over the one-­‐week period, over 22 millions tweets. This dataset is larger than 100GB, making it difficult to quickly grasp its global traits. eStoRED was used in combination with Neo4J graph database to pull data from the collection and show an overview of some of its aspects.
The second collection consists of open data coming from the UK company eCourier, and consists in actual movement
data of couriers tracked over more than a month. Prof. Matt Duckham et al. [1] used the collection for analyzing the
modeling possibilities for reconstructing individual movements or flow based on checkpoint counts at different times.

Driven by researchers’ deep understanding of the data, eStoRED was used to calculate and generate meta data and visualizations on this large datasets and enabled the creation valuable meta data enriching the collection and making it much more accessible for other researchers.

 

CALCULATORS

eStoRED stories are composed of annotated “data elements” connecting to data providers via a publish-­‐subscribe system. This feature allows a researcher to add specific model-­‐driven calculators with only a small effort and enables their seamless integration into the platform, without changing the eStoRED software itself. They simply need to be added to a curated catalog of calculators that keeps meta data for each of them for discoverability, provenance and re-­‐ usability, stored onto MyTardis data curation system [4].

Visualizers are capable of presenting data under a specific angle while calculators can apply some complex processing to the data received. An example of built-­‐in calculator is the asset risk estimator based on the ISO31000 risk management standard, with risk and mitigation lists, assessment formulae and enterprise dashboards. Concrete examples of past extensions include infrastructure deterioration models calculating timber, steel and concrete deterioration under climate adaptation risk scenarios for Australian and Pacific seaports [3], driven by Excel spread sheets containing the model as formulae.

 

DATA FUSION

Connecting a data element to several heterogeneous sources of data enables combining them into a single calculator or visualizer, augmenting considerably the possibilities of trying things out with data. This is crucial for a platform used close to data capture, at a time when data analytics is perhaps most powerful: models, hypotheses and evaluations are still being fine-­‐tuned, failure is a key to success, and changes in experiments can accelerate the research perhaps most. At such a stage, data is still being explored and tinkered with, varying their formats, their algorithmic processing and visual presentation to target the academic or sponsor community, and data is fresh in the minds of researchers and can be described and documented with least effort.

A simple example of the fusion of data in eStoRED was implemented in a proof of concept for the Australia-­‐India Research Center for Automation Software Engineering (AICAUSE), where the topological organization of a production line in a factory, modeled as static data, is combined in a single visualization with live sensory monitoring data from the production line as it operates [5].

 

CONNECTORS

A key extensibility feature of eStoRED is its open architecture permitting researchers to define connectors to a variety of external services federated around RabbitMQ service bus. The benefits of connectors include opening links to and RESTful services, live real-­‐time data feeds, data ingestion and conversion scripting as part of connector functionality, and a federated peer-­‐to-­‐peer architecture in a distributed Model-­‐View-­‐Controller pattern. In contrast, many other visualisation and story drafting tools depend on local data, databases and/or often, static data.

 

CURATION

eStoRED is an integral researcher-­‐facing part of a platform that includes the MyTardis research data curation system. MyTardis is an application for cataloguing, managing and assisting the sharing of large scientific datasets privately and securely over the web [4].

 

CONTINUITY

The Chiminey parallel sweeper and smart connector to clusters and clouds [6], and the KNIME workflow engine are major components in the software stack surrounding eStoRED. Both components are configured to take curated data and meta data from the MyTardis curation platform [4], push them through predefined analytics processes and cycle the results back to the MyTardis with much of the required meta-­‐data predefined. This data-­‐centric and cyclic research data process supports the intrinsic model-­‐experiment-­‐evaluate cycle underpinning the scientific method and places eStoRED as a key research-­‐user facing front to add descriptions, meta data, calculations and scripting, which are all curated themselves. This not only assists automating the scientific process using existing open-­‐source tools such as KNIME, Chiminey and others but also supports repeatability and reproducibility of a continuous scientific process.

These are just a few of the strengths of eStoRED. Its versatility allows adapting easily to various research domains, its scalability enables it to work on large amount of data, and overall provides an environment that supports data exploration and evaluation, enables significant enrichment and prepares to tell the researchers’ stories whether in publications or in other data sharing tools.

 

REFERENCES

1. Duckham M. et al. (2016), Modeling Checkpoint-­‐Based Movement with the Earth Mover’s Distance. In Miller J., O’Sullivan D., Wiegand N. (eds) Geographic Information Science. GIScience 2016. Lecture Notes in Computer Science, vol 9927. Springer, Cham
2. Pond P. (2016), Software and the struggle to signify: theories, tools and techniques for reading Twitter-­‐ enabled
communication during the 2011 UK Riots. PhD thesis, RMIT University, January 2016
3. McEvoy, D, Mullett, J, Trundle, A, Hunting, A, Kong, D and Setunge, S (2016), A decision support toolkit for climate resilient seaports in the Pacific region. In Ng, Becker, Cahoon, Chen, Earl and Yang (ed.) Climate Change and Adaptation Planning for Ports, Routledge, London, pp. 215-­‐231.
4. S. Androulakis, J. Schmidberger, M. A. Bate, et al. (2008), Federated repositories of X-­‐ray diffraction images. In Acta
Crystallographica Section D, 64(7):810–814, Jul 2008.
5. Prévost G., Blech J., Foster K. and Schmidt H. (2017), An Architecture for Visualization of Industrial Automation Data. In Proceedings of the 12th International Conference on Evaluation of Novel Approaches to Software Engineering -­‐ Volume 1: ENASE, ISBN 978-­‐989-­‐758-­‐250-­‐9, pages 38-­‐46. DOI: 10.5220/0006289700380046
6. Yusuf I I, Thomas I E, Spichkova M, et al. (2017), Chiminey: Connecting Scientists to HPC, Cloud and Big Data. Big Data Research Volume 8, July 2017, Pages 39-­‐49.

 


Biographies

Guillaume is a Research Software Engineer at RMIT University. He obtained a Master degree in computer science at the European Institute of Information Technology (Epitech) in France in 2012. He worked in the industry with AtoS Worldgrid for a year before starting to work in eResearch with RMIT in 2013.

Heinz (Heinrich) is a Professor of Software Engineering at RMIT University, where he holds the post of eResearch Director and Director of the Australia-India Research Centre for Automation Software Engineering. He is also Adjunct Professor at Mälardalen University and has been Adjunct Professor at Monash University for several years. Prior to RMIT he worked at Monash University (Professor of Software Engineering and various posts as Head of Department, Centre or Associate Dean), CSIRO and ANU, ICSI UC Berkeley and GMD (now Fraunhofer) in Germany. http://orcid.org/0000-0001-6278-4793

About the conference

eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.

© 2016 - 2017 Conference Design Pty Ltd