Mr Nicholas May1, Dr Ian Thomas2
1RMIT University, Melbourne, Australia, firstname.lastname@example.org
2RMIT University, Melbourne, Australia, email@example.com
The submission of datasets to open data repositories needs to be managed, to assure the quality of data and to ensure that only appropriate datasets are accepted. However, this review process can be both onerous and inefficient. Therefore, a system that supports and manages the submission and review of proposals would speed the review of proposals and promote the openness of data repositories. Data Reviews Online is a step in this direction.
At RMIT University, the eResearch Office and the Library has been providing resources to researchers for the promotion, publication, and sharing of their datasets, through a merit based allocation of resources called the Research Data Grants Program (RDGP). In this program, the selection of proposals is based on two factors: significance, and strategic alignment. The significance of the dataset is assessed on criteria as defined by Russell and Winkworth , such as: completeness, rarity, research potential, and artistic merit. The alignment is assessed against RMIT University’s research strategy, which is embodied in its Enabling Capability Platforms (ECP) . Given these diverse criteria, identifying reviewers with the appropriate expertise is essential.
The proposal submission and the ingestion planning for the RDGP has been partially implemented, via a Google form and a Microsoft Word document, respectively. Despite the initial number of submissions being low, the overall management of this process was manual and found to be onerous, especially the review and selection processes. Hence a system that manages the submission and review process would be a boon for the RDGP and will allow the program to expand the number of calls issued and submissions processed. In addition, it could attract further deployments by the wider eResearch community, since this sort of program is expected to grow across Australia and around the globe.
The review of submissions into data repositories is important to maintain quality and focus. Lawrence et al.  surveyed example procedures and proposed a generic checklist for data reviews, which included: quality of the data and its metadata, availability and access of the data, the reliability of the data source, and the potential user community. Whilst this checklist is more complete than the review criteria established for the RDGP, some criteria are not required. For instance, the quality of metadata is not an issue for proposals to the RDGP, because resources are specifically provided to help with the extraction and refinement of metadata for the selected datasets.
There is currently no generic solution for the submission and review of data repository proposals. A system to manage calls, submissions, and peer review of conference papers, called EasyChair , has been available since 2002, and there are solutions for specific repositories (several examples provided by Lawrence et al. [3, p12]). But no generic solution, comparable to EasyChair, exist for research data. A generic solution would allow repository owners to expand, or restrict, the community of available data sources and reviewers, and to match them based on multiple criteria, such as Field of Research (FoR) classification , etc. However, the review requirements of specific domains, such as required metadata formats, and the automatic ingestion and verification of the data, could not be accommodated, given the need to support a wide range of repositories.
THE SYSTEM: DAREON
The initial development goals were to establish the framework for the overall system, including: the development infrastructure and core functionality. In addition, we aimed to lay the foundation for an open-source project, through a deployment and testing infrastructure, and thus support the ongoing development of the platform, using established best practices for community driven open source software.
The result, Data Reviews Online (Dareon)  is a web-based application that assists in the process of submission and review of proposals for the inclusion of datasets into a data repository. It helps with the management of calls for proposals and the associated proposal review process. An initial and high-level use case diagram is show in Figure 1. This shows the three main user roles as Repository Owner, Dataset Owner, and Domain Reviewer. Screenshots provided show sample details for:
- a repository [Figure 2],
- a call for proposals [Figure 3],
- and a proposal [Figure 4].
Figure 2: a sample Repository
Figure 3: a sample Call for Proposals
Figure 4: a sample Proposal
A feature of this system is the ability to classify repositories, datasets, and reviewers, using multiple, concurrent classification schemes. In the RDGP example, the classification schemes used are the significance criteria and ECP alignment for datasets and the ECP alignment for reviewers. In the sample repository details [shown in Figure 2] the classification scheme used is the ANZSRC FOR codes . In future development, these classifications will enable the smart matching of reviewers with dataset.
The outcome of this project is a system, called Dareon , which manages the processes that govern soliciting proposals for the inclusion of new datasets into institutional research data repositories. The system oversees the submission of proposals, review, selection, and ingestion planning processes, and supports the workflows for the three principle roles. The project has been established as an open-source platform that provides a generic solution and will support the future development across the eResearch community.
- Roslyn Russell and Kylie Winkworth. “Significance 2.0: A guide to assessing the significance of collections.”, Collections Council of Australia, 2009.
- Enabling Capability Platforms, RMIT University. Available from: https://www.rmit.edu.au/research/research-expertise/our-focus/enabling-capability-platforms, assessed 29 Jun 2017.
- Lawrence, Bryan, et al. “Citation and peer review of data: Moving towards formal data publication.”, International Journal of Digital Curation 6.2 (2011): 4-37.
- EasyChair: The Conference system. Available from: http://easychair.org, accessed 29 Jun 2017.
- Classification Codes: Available from: http://www.arc.gov.au/rfcd-seo-and-anzsic-codes, accessed 29 Jun 2017.
- Data Reviews Online. Available from: http://github.com/dareon-org, accessed 29 Jun 2017.
Nicholas May is a software developer in the eResearch Office of RMIT University. He has over twenty-eight years of varied experience within the software engineering, across industries and domains, and holds the Certified Professional status with the Australian Computer Society. His current role includes the responsibility for promoting research data management across the research lifecycle. http://orcid.org/0000-0002-1298-1622