The NLeSC eScience users’ survey: learnings from actually asking actual users about actual use

Mr Guido Aben1

1AARNet, Kensington, Australia, guido.aben@aarnet.edu.au

STOP WORRYING AND LEARN TO LOVE SURVEY DATA

Consensus is developing among eScience policy makers, both domestically as well as overseas, that future eScience policy must contain provisions to more robustly evaluate eScience deliverables; both immediately upon delivery as well as at set intervals after delivery. Typically among the suggested indicators we find metrics about user acceptance, tool penetration and similar “social” values. Up to this point, however, few institutions or service providers (let alone countries) have actually executed any large-scale surveys to gather baseline data on performance, acceptance and penetration of their existing portfolio, nor have any large-scale surveys (as opposed to one-on-one interviews; the more traditional method) been executed canvassing expectations and predictions of research infrastructure users (across domains) and usage (at all capacity levels).

We are aware of one exception to this: the 2016 Netherlands eScience Centre survey, which was conducted during Q4 2015 and presented February 2016, across a (highly significant) population of 1048 respondents (9% of the population canvassed).

On the proviso that Dutch eScience policy and execution has commonalities with Australian national research infrastructure policy, this survey and its attendant report summary are a veritable treasure trove of insights and learnings, as well as a number of sobering observations about the efficacy and uptake of eScience tools, services, and platforms up to this point.

The lightning talk aims to present a few salient points, and alert people to the availability of an English translation of the 2016 Netherlands eScience Centre survey report.


Biography

Guido Aben is AARNet’s director of eResearch.

In his current role at AARNet, Guido is responsible for building services to researchers’ demand, and generating demand for said services, with CloudStor perhaps the most widely known of those.

Libraries and Digital Humanities Downunder

Ms Ingrid Mason1

1AARNet, Sydney, Australia, ingrid.mason@aarnet.edu.au

DESCRIPTION

This lightning talk will debate a single question:

Why does Australia need to foster the development of regional communities of practice and participate in international communities of practice, linking the digital humanities researchers and library practitioners, as part of research infrastructure capability development and library support for data intensive humanities and arts research?  

This proposed lightning talk is relevant to this year’s conference because the 2016 NCRIS Roadmap outlines, in section 1.4 Skills and Career Development, that “There are two elements to successfully utilising world-leading infrastructure.  The first is training and development of both facility managers and technical staff… The second element is the skill level of researchers.”

As a guide to the reader:

dh+lib is a community of “librarians, archivists, Library & Information Science graduate students, and information specialists” [1] in the US keen to contribute to the conversation about digital humanities and libraries.  The online platform for this community of practice of academic librarians emerged out of an Association of College & Research Libraries (ACRL) “digital humanities” special interest group [2].   The Alliance of Digital Humanities Organisations (ADHO) is an international network of digital humanities organisations, and the Libraries and Digital Humanities SIG is a ADHO special interest group to “foster collaboration and communication among librarians and other scholars doing digital humanities work.” [3]

REFERENCES

  1. About dh+lib. Available from: http://acrl.ala.org/dh/about/, accessed 19 June 2017
  2. ACRL Digital Humanities Interest Group. Available from: http://www.ala.org/acrl/aboutacrl/directoryofleadership/interestgroups/acr-igdh, accessed 19 June 2017
  3. ADHO SIGS. Available from: http://adho.org/sigs, accessed 19 June 2017

Biography

Ingrid Mason, Deployment Strategist with AARNet, provides support for engagement and the uptake of the national research and education network (NREN) and services with AARNet members across the research, cultural and collections sectors. Ingrid has worked on several NCRIS programs: Australian National Data Service, National eResearch Collaborative Tools and Resources, and Research Data Services.

Making Terra-Bytes of data accessible in ‘web-time’!

Mr Uwe Rosebrock1, Mr Simon Pigot1

1Csiro, Hobart, Australia, uwe.rosebrock@csiro.au

 

ABSTRACT

The  Australian  Wave  Energy  Atlas  (AWavEA)  portal  provides  access  to  a  32-year  hind-cast  of  wave  data  from  the Australian region at an hourly temporal resolution. In its entirety, it consists of nearly 20 TB of data of which a subset – 5 TB, is used to provide real-time time-series  analysis. In average a web-user’s  expectations  to wait for a asynchronous query is in the order of 10’s of seconds. With some simple measures the supporting data was prepared to allow analysis processes to return results covering over 300,000 records of data in under 10 sec. This is an improvement of 3 magnitudes to the standard layout of the data.

With increasing amount of data available and the cross-disciplinary  use, it is no longer feasible to simply copy or query large data holding remotely. Query processes are necessary in front of data, an example is the NCI data-cube. We like to present simple measure which allow improved access and make incorporation in spatial portals feasible.


Biography:

Uwe Rosebrock is a Senior Software Engineer at CSIRO Oceans and Atmosphere in Hobart. He has extensive experience in large data processing, software design as well as project and defect management. Uwe leads a team of software engineers, who led the development of the ARENA-CSIRO Australian Wave Energy Atlas, and its integration into AREMI, and also developed CSIRO’s relocatable modelling system and the DIVE visualisation packages as part of the CSIRO/BoM/Navy BlueLINK program.

What if Data were not Forever?

Mr Rob Cook1, Dr Rhys Francis2

1Pangalax, Bardon, Australia,

2eRF, Diamond Creek, Australia

 

INTRODUCTION

Digital data is an increasingly important ingredient of research [2][3][4][7]. As part of the evolving culture of research, researchers are being asked to plan, manage, publish and retain the data used in their research projects [6]. Methods of doing so are being required as part of institutional data management policies. Valuable data is being curated, maintained and used as inputs to ongoing research, sometimes as components of substantial data collections with broad interest and use. This development in research methodology is exceptionally valuable and acts to increase the available stock of knowledge and to raise the level of reproducibility of research.

Storing all research data in a manner approved in data management plans means that requirements for data storage repository space are increasing rapidly, alongside the time and effort required to curate and manage data that also grows similarly.

Ultimately continuing growth in preserved data is unsustainable as data preservation costs do not decrease as fast as data volumes and complexity are increasing [9], and there is no obvious answer to who will be willing to pay the escalating bills.

This paper challenges the reader to think about how custodians can rationally and safely reduce total data costs, and how research processes can minimise the effort involved in handling research data. It proposes a simple time scale approach.

 

DATA TIME SCALES

Our contention is that the purpose of data retention can be classified depending on the maturity of the research use of the data and the group responsible for the stewardship of the data. The data in each of these states has an associated time scale during which the data has value for its research users. These time scales are observable across the creation and use of research data. Recognising them leads to a practical decomposition of the Australian research data system.

 

Time scale State Locus of Stewardship
3-5 years Active Research Data Researchers and research projects
10-15 years Openly Exchanged Research Data Research performing institutions
Decades Research Community Data Research communities and their supporting institutions
Indefinite Stock of Knowledge Society at large through governments

 

Budget restraint depends on agreeing that not all research data is of interest over all of these different time scales. In addition, the achievement of meaningful data management for data that is of interest over each of these time scales has different purposes, different access and durability requirements, different cost drivers and different custodial interests. Therefore these time scales help set out components of a design for an Australian research data system by identifying separable purposes within it.

PROGRESSION AND SELECTIVITY

The challenge of storing digital data can be related to the volume and complexity of the data itself; the quality of its curation; the scale and complexity of the infrastructure required to support its retention, organization, accessibility and use; the longevity and durability of its preservation and the degree to which automation can be applied.

Publically funded research, by virtue of budget realities, has limited scope to cope with an exploding cost of data. Consequently, downward pressure needs to be applied to each of the cost factors in order to maximize the amount of valued data that can be retained for future use. Indeed, it is imperative to produce a cost of managing data to the publically funded research sector that is significantly sub-linear in the volume, variety and velocity [5] of that data.

The difference in scope and participation can be understood as follows:

  1. Active research data is the ‘working data’ of research projects and relates to any data created or used by researchers in the research projects they undertake
  • Openly exchanged research data is the ‘working data’ of the research system itself. It is data that is curated in support of the exchange of knowledge, largely amongst the associated research communities, and for the purposes of improved research integrity, quality of research and for the purpose of research reproducibility. It is data that is managed according to best-practice management principles, in order to underpin the global research culture.
  • Reference research data is the ‘working data’ of broadly based research communities operating over sustained periods of time and cutting across research programs and interest areas. It is often data that would also meet the needs of the social and economic beneficiaries related to those research communities. It would most likely be managed in concert with stakeholders.

These three purposes have different participants as stakeholders, and may justify significantly different costs per data element. The beneficial outcome, the selectivity of data inclusion, the curation of the data sustained, and the durability of the infrastructure and organisational arrangements in support of them, will all be different and drive different costs. The recent call for core data to be identified and treated differently in life sciences [1] is a contemporary example.

The intention is that theses components should be conceived of as supporting different states for data in a data system and thereby reveal transitions between the states. Identifying the components of the system allows these critical but missing transitions and their associated policies to be developed. Those policies will determine total data system costs.

CHALLENGES

The transition from active research to openly exchanged data involves publication and selectively sharing published and other completed research data across research teams. How is this selection and sharing accomplished? Does the researchers institution accept the cost of making openly exchanged data available?

Data that is accepted as valuable across a research community involves a community agreed process and possibly the upgrading of data quality to meet the FAIR principles [10] that are required by that community. How do communities conduct such a process? To what extent can the application of FAIR principles be automated to enable more research community data to be accumulated? How do research communities fund the necessary storage?  The National Institute of Health in the US is funding a data commons for molecular biology data [8]. Can this be replicated in other domains?

Some research data qualifies for retention as part of the global stock of knowledge. Is this funded by governments and other central funding agencies because of its value to society?  How should data in this state be identified?

REFERENCES

  1. Anderson et al (2017), Data management: A global coalition to sustain core data. Nature 543
  2. European Open Science Cloud High Level Expert Group (2016), Realising the European Open Science Cloud
  3. European Union (2010), Riding the wave, How Europe can gain from the rising tide of scientific data
  4. Finkel, A. (2017) 2016 National Research Infrastructure Roadmap, Department of Education and Training
  5. Laney, D. (2001), ‘3D Data Management: Controlling Data Volume, Velocity, and Variety’ , Technical report, META Group 6. NH&MRC (2007), Australian Code for the Responsible Conduct of Research
  6. NITRD, NSF (2016), Federal Big Data Research And Development Strategic Plan
  7. NIH, Data Commons, https://commonfund.nih.gov/bd2k/commons
  8. Rizzani, L (2016), Digital Data Storage is undergoing mind-boggling growth, EE Times
  9. Wilkinson MD. (2016), The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data 3

Biographies

Rob Cook provides a consultancy service in eResearch from his own company, Pangalax.  Until recently he was the CEO of QCIF, the Queensland-based eResearch service provider that operates part of the Nectar and RDS research cloud and data storage.  Prior to QCIF, Pangalax worked on a number of large eResearch projects, and before that Rob was CEO of Astracon Inc, a Denver CO company that offered telecommunications service management software.  In the past Rob has led efforts to found and operate Cooperative Research Centres including the Distributed Systems Technology Centre and Smart Services.

Rhys spent the first decade of his career as an academic researcher in parallel and distributed computing. The next decade and a half included roles as a senior principal researcher, research programme manager and strategic leader in information and communication technologies in the Commonwealth Scientific and Industrial Research Organisation (CSIRO). His experience includes being the High Performance Scientific Computing Director for CSIRO and the National Grid Programme Manager for the Australian Partnership for Advanced Computing. From 2006 Rhys worked within the Australian Government’s National Collaborative Research Infrastructure Strategy as the facilitator for its investment plan in eResearch and subsequently as the Executive Director of the Australian eResearch Infrastructure Council. Since then through a series of engagements he has continued to work to harness advancing information and communication technologies to the benefit of Australian research.

12

Recent Comments

    About the conference

    eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

    Conference Managers

    Please contact the team at Conference Design with any questions regarding the conference.

    © 2018 - 2019 Conference Design Pty Ltd