Siddeswara Guru1, Beryl Morris1
The environment is changing at a faster pace than anticipated due to human-induced activities potentially leading to climate change, loss of biodiversity and seasonal variations. The changes in our environment need to be tracked, studied and predicted so that adverse changes can be mitigated and plans made for a sustainable future. It is also essential to understand how the biological and social systems respond to environmental changes.
There is an increasing need for ecological information to address some of the challenges explained above and other significant social, economic and environmental problems. There are several initiatives at regional, national and international scales to collect and publish a large variety of ecosystem science data such as TERN (www.tern.org.au), NEON (www.neonscience.org), TERENO (www.tereno.net) and ICOS (http://www.icos-cp.eu) apart of state and federal government agencies.
Terrestrial ecosystem data is exceptionally heterogeneous concerning data types and data collection methods – data collected from human, hand-held, in-situ sensors and remote sensing observations. The observations also vary in spatial scale – point, plot, site, regions, biome, states and continent. The data handling mechanisms for these datasets are also different.
One of the challenges is to provide complex data streams in an easily accessible form with standard data formats and exchange protocols along with tools and platforms to interrogate, access, analyse and share analysis pipelines. The data infrastructure should enable easy access to custom data and information for a particular problem, access to scientific computing, visualisation and toolkits to enable data-driven science. The data infrastructure needs to evolve in response to community needs and provide a robust platform to meet the requirements of not only ecosystem science community but also inter-discipline researchers. It should also enable researchers to use same datasets for multiple applications and allow complete re-use and re-purpose of data and infrastructure capabilities.
The emerging data e-infrastructure should support holistic capabilities that not only manage to store, curate and distribute data but also enable processing, transformation based on user needs, access control, model-data assimilation, tracking data transformation at different stages of the analysis, linking consistent data to different analysis tools and pipelines.
The data e-infrastructure should also move towards big data technologies as drivers to deals with the increase in data production by different data observing initiatives and need to provide coherent eResearch platform to provide services for research data management and cross-disciplinary collaboration with flexible governance model. In ecosystem science, this should enable access to large collection of earth, environment, biodiversity datasets that support regional, continental and global issues and policy matters, while supporting on-demand compute and storage need for data-centric scientific applications; secure environment for data storage and processing; policy to support data security, privacy and confidentiality; authentication and authorisation to support virtual collaboration and building scientific communities.
A conceptual framework has been developed to identify significant elements and functionalities to bring coherent services to the ecosystem science community. The framework brings together all the data management components in addition to the computation platforms to enable users to access and perform data analysis in a data-centric approach. A collective interaction between different components of the framework will deliver to the needs of the ecosystem science data users. The conceptual framework identifies most of the services required but broadly classified into data source, transformation services, enabling services, delivery services and governance.
Data sources are the important part of any data e-infrastructure, data can come from different sources but we are focusing on the types of data sources that are most useful in terrestrial ecosystem science, the key is to provide a standardised access to different data streams to extract information and knowledge. Transformation services are part of the data management suites that transform data into information. The enabling services perform key activities to meet user needs, these are intermediate between user access and raw information. The delivery services are service accessibility points to interact with the infrastructure. Service initiation describes how users will initiate services to get some sample outcomes listed under outcomes.
The framework is structured such that each of the services fulfil certain task and run independently and overall framework run as a loosely coupled collaborative services. The services can also be managed by different organisations but can coupled with other services to meet certain use cases. The interaction between the services are the key.
Figure 1: A conceptual framework to deliver End-to-End services to the ecosystem science community.
We envisage that the infrastructure will be hosted on cloud-based NeCTAR and RDS infrastructure of the Australian Research Data Commons (ARDC). The cloud technologies should simplify the building of such infrastructure with access to combinations of IaaS, PaaS and SaaS. Access to multiple nodes of the ARDC would enable to spread the infrastructure across multiple geographic regions and improve fault tolerance and availability.
The abstract present some of the future thinking in the development of more cohesive e-infrastructure to meet user needs. The approach taken is to look at potential use cases and identify several critical services that may require to meet those use cases. Each of the services perform certain tasks with an ability to interact with other services.
Siddeswara Guru is a program lead for the data services capability of TERN. He has substantial research and management background and has worked in the eResearch projects across multiple domains.