Jonathan Yu1, Benjamin Leighton2, Jevy Wang3, Hendra Wijaya4
1CSIRO L&W, Clayton, VIC, Australia, firstname.lastname@example.org
2CSIRO L&W, Clayton, VIC, Australia, email@example.com
3CSIRO L&W, Black Mountain, ACT, Australia, firstname.lastname@example.org
3CSIRO/Data61, North Ryde, NSW, Australia, email@example.com
Discovery and access of data to support research projects and policy analysis is currently limited. While, many services are increasingly publishing data, for researchers and policy analysts, these are not easily discoverable and accessible, not comprehensive and not linked with tools and approaches that promotes their use. On the other hand, data providers are often disconnected with user groups and lack the ability to capture, attribute and accrue value to justify further business cases in improvements to allow the data to be more discoverable, accessible, interoperable and reusable. Therefore, this is a barrier that limits the ability to develop repeatable and evidence-based policy analysis and research in Australia.
CSIRO is developing the Knowledge Network (KN) platform (https://kn.csiro.au), which provides a gateway to data published via range of data initiatives, including NCRIS and open government data initiatives. KN harvests and indexes known data records from multiple data repositories in government and research. This is then made available to allow anyone to discover, access and share links to data at the collection level and at the individual file or service level all in the one platform.
By having datasets and file level information available in the KN platform, it provides opportunities for researchers to leverage these in online platforms, including data analytics environments (e.g. virtual laboratories or science gateways), as well as web applications tailored for specific communities. KN is currently being used in the ‘EcoScience Research Data Cloud and Data Enhanced Virtual Laboratory’ project (ecocloud for short)  to enable discovery and access to third-party data for use with the ecocloud compute platform. In particular, KN is powering discovery and access via the ecocloud explorer which displays a tailored set of search results of data relevant to the ecological science domain. This then allows ecocloud users, such as researchers or policy analysts, to discover and access relevant data in the ecocloud explorer, and provide code snippets for its use in ecocloud compute environments. However, the current APIs provide means for other projects and initiatives to provide a tailored view of data from a comprehensive superset which aims to have national coverage.
As information about the dataset and file level metadata is also indexed in KN, this provides opportunities for developing quantitative surveys of the data landscape, particularly in Australia to enable analysis and report on current state [2,3]. By understanding the current state of the data landscape, it allows greater data-driven insight and understanding of trends and gaps in data initiatives in general over time based on the metadata and datasets themselves. Specifically, it allows for a data-driven picture of emerging trends of topics and activities for specific scientific/research communities as well as public and private sector-based agencies. This then allows opportunities for assessment of improvements in future initiatives based on data-driven insights.
In this presentation, we provide an overview of the KN technical architecture, its use in a virtual laboratory context, and a discussion around data-driven insights that can be gained from the KN platform to inform a ‘state of the data’ picture for Australia.
- EcoCloud, https://www.ecocloud.org.au, accessed 20 June 2018
- Yu, J., et al., Survey of open data and research data in the Australian context via the CSIRO Knowledge Network, eResearch Australasia, Brisbane, Australia, October 2017
- Yu, J., et al., Visualising the Australian open data and research data landscape, Collaborative Conference on Computational and Data Intensive Science, 2018 (C3DIS 2018), Melbourne, Australia, May 2018, DOI: 10.13140/RG.2.2.33826.32964
Dr Jonathan Yu is a data scientist researching information and web architectures, data integration, Linked Data, data analytics and visualisation and applies his work in the environmental and earth sciences domain. He is part of the Environmental Informatics group in CSIRO Land and Water. He currently leads a number of initiatives to develop new approaches, architectures, methods and tools for transforming and connecting information flows across the environmental domain and the broader digital economy within Australia and internationally.