Ms Kylie Black1, Dr Cornelia Hooper2, Dr Agi Gedeon3, Ms Katina Toufexis1, Ms Merrilee Albatis1, Mr Scott Nicholls1, Professor A. Harvey Millar2
3Office of Research and Innovation, Edith Cowan University, Perth, Australia, firstname.lastname@example.org
Throughout 2016 and 2017, the University of Western Australia (UWA) Library partnered with researchers in the Australian Research Council Centre of Excellence in Plant Energy Biology on the cropPAL2 project, an Australian National Data Service (ANDS) High Value Collections project. In addition to developing the second iteration of the existing cropPAL database of agricultural protein data (http://crop-pal.org), the project was highly successful in establishing the mutually beneficial collaborative partnership between the researchers in Plant Energy Biology and librarians in the UWA Library. This paper will outline some of the key benefits that can occur with librarians working with researchers on data driven projects.
High throughput plant genomics and advanced breeding methodologies underpin the global collaborative efforts targeted at increasing food production. Due to lack of indexing and linking, these growing data resources are critically under-explored and this limits their potential to be used in innovative crop development. The original cropPAL database (compendium of crop Proteins with Annotated Locations), provides open access to protein data for wheat, barley, rice and maize from data that was generated over 10 years by over 300 institutions. The cropPAL2 project built on this platform for more crops of high economic value as a food source (banana, canola, grapevine, potato, sorghum, soybean and tomato), providing access to protein data and linking it to global plant protein catalogues. The Centre for Plant Energy Biology, in collaboration with the UWA Library and ANDS, successfully built cropPAL2 with funding from the ANDS High Value Collections Project scheme.
BENEFITS FOR THE LIBRARY
The Library contributed to the cropPAL project through librarian expertise in searching and retrieving scholarly literature, formulating and executing complex search strategies, bibliographic database tools, metadata, and research data management. These skills were critical to the success of the project in terms of improving the efficiency of the cropPAL data collation process and ensuring that the dataset was described, stored, and made available in an open access format. There is now an opportunity to further promote these skills and expertise to other parts of the UWA research community.
Involvement in cropPAL provided an opportunity for librarians to contribute directly to a research project that has economic, social, health, environmental and academic impact. It also allowed the Library to assist the University in meeting a number of objectives in its Strategic Plan relating to internationally renowned research, including undertaking “research across all our disciplines, focused on issues of relevance to our communities and industries, while generating understanding and solutions of global value” and building “problem-oriented multidisciplinary teams” . The collaboration focused on two areas: search strategy, literature capture/text mining and permissions; and data management and promotion of the resulting cropPAL2 dataset.
The project demonstrated that for searching, capturing and dealing with permissions around literature, the collaboration between scientists and librarians was highly beneficial. Being embedded in a research team over an 18 month project led to insights into how research projects are conducted and especially the need to balance time spent doing research with other tasks: the reporting required, attending project meetings, conferences and writing papers. At a more detailed level, it was very informative to see how researchers search for and manage biological information and the importance of co-authorship and citations, enabling librarians to better assist other researchers. Forming a close working relationship with the scientists in Plant Energy Biology has led to assistance in other areas, such as librarians providing data for grant applications. The project has also led to some offshoot benefits that the Library and Plant Energy Biology will continue working on together, in relation to automating text mining and updating of the cropPAL collection in future.
In return, there was a mutual benefit resulting in improved data management. As with the first area, there were benefits for the Library in working so closely with real researchers with real data, such as versioning, allocating DOIs, and how to address the question of a new version versus an update. Scientists involved in cropPAL represented the senior user group for a Library data repository project reviewing dataset maintenance, and advising in a repository platform migration project aimed to improve hosting of research datasets.
Additionally, the project provided an extremely valuable and rare opportunity for a librarian to be fully embedded within a research team and within the research process. This resulted in an increased confidence in the librarian to be able to directly and positively contribute to the research process.
BENEFITS FOR THE RESEARCHERS
Librarian expertise in search strategy and data management was part of the original project proposal and resulted in a number of contributions. In the area of literature searching, the librarian was able to identify functionality within the EuropePMC database (https://europepmc.org/), which facilitated specific section searching techniques to improve the relevancy of overall search results. The cropPAL database relies on extracting relevant data from the published literature so efficient searching is critical. While the researchers were already aware of the existence of EuropePMC, librarians were able to develop search strings that bypassed the functional limits of the interface. Library staff analysed how the section search could be developed further by referring to the literature  and then applying these learnings to the problem of finding protein data for the selected species. Librarians were also able to leverage off the results of the literature searches that were carried out by researchers in the creation of the original cropPAL. The scientists in Plant Energy Biology had retained records of all the literature that had been manually assessed for inclusion or exclusion from cropPAL, which made it an excellent database to test the effectiveness of search strategies.
The collaboration between researchers and librarians was more complex in relation to research data management, as the cropPAL dataset was used as an example in relation to data promotion, storage, versioning and management. This was part of a Library project to decommission the internal system and migrate the data to the UWA Research Repository. In particular, discussion of the issue of data inheritance led to new approaches towards how data is submitted and linked to UWA staff to ensure ongoing data maintenance and accountability. Further discussions around the data archiving and security influenced the organisation of server and back-up server infrastructure within the wider university system. The benefit for the researchers was in understanding more about these issues and being able to apply them to the cropPAL dataset as it was being developed.
The cropPAL2 project has led to a new approach being developed for data curation and linkage, involving personnel from Plant Energy Biology, the Library and from the Office of Research Enterprise. The novel approach of cropPAL data curation foresaw a software concept that can identify data in scientific studies at high precision and automate the process of data capture and linkage. Such software will be developed with commercial distribution in mind and will contain several strategies developed during the work on both cropPAL projects and the related SUBA project (for Arabidopsis subcellular protein localisations, http://suba.live). This highlights the positive change in attitude towards the value of a tight collaboration between the local institutional library services and lab-based scientists.
 The University of Western Australia, UWA 2020 Vision: Strategic Plan: 2014-2020. Available from http://www.web.uwa.edu.au/__data/assets/pdf_file/0010/2538343/114085-VICCHA-StrategicPlan-v3.pdf., accessed 16 June 2017, p.7.
 Kafkas, S., et al., Section level search functionality in Europe PMC. Journal of Biomedical Semantics, 2015, 6(7). DOI: 10.1186/s13326-015-0003-7
Kylie Black is the Senior Librarian for the Faculty of Science at the University of Western Australia. Prior to commencing in the UWA Library in 2013, she worked as a Subject Specialist: Music at the State Library of WA and as a librarian at Curtin University. Kylie is a UWA graduate, with a Bachelor of Arts with Honours in Musicology and a Graduate Diploma in Information and Library Studies from Curtin University. She has previously won the Australian Library and Information Association’s Early Career Award for excellence in the first 5 years of her career, as well as being selected as the Western Australian representative for a Goethe Institute study tour of German libraries. Her interests include measuring research impact and library support for systematic reviews.