Working towards the “end-to-end”: Research Data Management Capability Development at UNE

Mr Thomas Reeson1, Dr Paddy Tobias2

1University Of New England, Armidale, Australia,

2Intersect Australia, Sydney, Australia,


The implementation of an enterprise-level research data management solution is cumbersome and complex. Sets of requirements and use-cases in the research data management lifecycle vary considerably, which is further complicated by the differing motivations of parties involved. While the institution is driven by responsibilities in administration and compliance, researchers are seeking greater support to facilitate good research practice and improved outcomes.

As such many in the sector now recognise that any enterprise-level solution for research data management needs to be “end-to-end”, meaning that it integrates into a single workflow all processes from start to finish of the research data lifecycle. Simply implementing or leveraging existing systems to address one aspect of the data lifecycle is not enough without working to integrate these with other existing data management processes [1, p. 158].

With this in mind, there is a shortage in the sector of support and guidance in relation to the end-to-end institutional solution, even in terms of advice on asking the right questions when scoping solutions. This lack of a holistic and critical guidance engenders frequent second-guessing and wasted investment by the institution.

This paper is delivered mid-way through a 12-month project at the University of New England to enhance capabilities and encourage engagement in new methodologies for research data management using the ANDS Capability Maturity Model as a guide. In the context of the problem outlined above, the paper will present the end-to-end workflow developed at UNE and discuss the thinking behind the solutions put in place. It will also address the difficulties confronted along the way with retrofitting systems integration, gaining researcher buy-in, and establishing standards for the University of New England. The paper will finish by listing a number of questions that are currently unresolved and need answering in the sector in relation to end-to-end institutional support. The paper is intended to be thought-provoking and generate a discussion.


  1. Johnston, Lisa R. “Curating Research Data Volume One: Practical Strategies for Your Digital Repository.” (2017). Available from:




Thomas Reeson is a recent addition to the University of New England Library’s Research Advisory and Engagement Services team. Thomas has worked at Griffith University, the State Library of Queensland, the University of Southern Queensland, and Queensland University of Technology as the QULOC Graduate Librarian. Thomas also worked as the Paramedic Sciences Liaison Librarian at the University of Queensland. As the Research Data Librarian, Thomas assists UNE researchers with planning, storage, description, discovery, and preservation of research data.

Dr. Paddy Tobias represents Intersect at the University of New England. Paddy has several years of experience working in developing countries as an academic, program director, policy adviser and researcher. As an eResearch Analyst, Paddy works to improve research projects with better adoption of digital solutions. The role covers policy advice, training and engagement for research data management, digital research support and data-intensive research.

Victorian Marine Data Portal (VMDP) – Leveraging the IMOS AODN Portal work

Dr Christopher McAvaney1, Dr Alex Rattray2, Ms Michelle Watson3

1 Deakin University, Waurn Ponds, Australia,
2 Deakin University, Warrnambool, Australia,
3 Deakin University, Geelong, Australia,



Supported by the High Value Collection (HVC) program of Australian National Data Service (ANDS), Deakin University has collaborated with the University of Tasmania (via the Institute of Marine and Antarctic Studies – IMAS) and the Integrated Marine Observing System (IMOS) NCRIS capability to implement an instance of the Australasian Ocean Data Network (AODN) portal.

The newly launched Victorian Data Marine Portal (VMDP) provides access to marine data collected by Deakin researchers, and brings together data collected by various research organisations including DELWP, Parks Victoria, and the CSIRO. All the data is openly accessible, supporting the search and discovery of Victorian marine spatial data by researchers and governments, as well as community groups and the general public. The portal complements regional, national and global knowledge databases including Seamap Australia.

The poster will provide an overview of the project, highlighting benefits and value including:

  • The collection, collation and preservation of Victorian marine habitat data from various agencies
  • The provision of access to important research data via a single portal
  • The implementation of a classification scheme to describe the data in a uniform way, and to facilitate discovery
  • Support for ongoing research in this area by simplifying the discovery process and encouraging serendipitous discovery of research data
  • Recognising and re-using the work undertaken by IMOS in support of the AODN Portal software stack (Java Tomcat, GeoNetwork and GeoServer on a PostgreSQL database with GIS extensions)
  • Ability to use ArcMap/ArcGIS or QGIS to interact with the portal

Built to support the ongoing ingestion of new marine datasets, the poster will include a detailed view of the VMDP marine research data lifecycle, from the collection of data via instruments through to the ingestion into the portal.

The poster will also touch on future work to support marine research expected to be undertaken over the coming months, including:

  • Enhancing collecting information through digitisation of analogue video
  • Collaboration with UTAS/IMAS for Seamap Connections (aggregation of aggregations)



Christopher McAvaney is Services Manager, eResearch at Deakin University, responsible for establishing and implementation an eResearch program of work from eSolutions at Deakin University. A key deliverable of Christopher’s work is articulating a range of research services within the ICT service catalogue of the university. Christopher’s role involves working with eSolutions (central ICT), Deakin University Library and Deakin Research (the research office) to ensure that a coherent and consistent approach is followed. An important aspect of his role is collaborating with external partners at the state and national level to build local and global collaboration opportunities. Christopher’s research background is in parallel and distributed systems, in particular applied research around an automated parallelisation tool.

A survey of attitudes, perceptions and experiences around data sharing and the concept of open data in the Australian Earth Science community

Prof. Brent Mcinnes1, Prof. Joel  Cutcher-Gershenfeld2

1Curtin University, Perth, Australia,

2Brandeis University, Waltham, USA,


This work reports on the findings of a 2017 national survey of attitudes, perceptions and experiences around data sharing in the Australian Earth Sciences community. The survey, which is the first of its kind in Australia, provides a benchmark metric for the adoption and utilisation of open data concepts by Australian Earth Scientists, and to determine where Australia sits in the “open data” spectrum relative to counterparts in the United States and Europe.

A total of 249 Earth Science professionals from academic (69%), government (22%) and industrial/other organisations (9%) participated in the survey.  The responses were evaluated on the basis of self-identification of gender, disciplinary focus (geoscience, eResearch and interdisciplinary) and age cohort.  Notable findings include:

  1. For all respondents, there are large gaps between the importance of finding, accessing, and using data within across fields/disciplines and the ease of doing so. The gaps are smaller across fields/disciplines, but are present.  Interdisciplinary researchers value finding, accessing, and using data within and across fields more than others, while also having a larger gap in perceived difficulty of accessing this data.
  2. Women value finding, accessing, and using data within and across fields more than men, while also having a larger gap on difficulty. The most senior cohort sees using data from other fields as being of less importance than others.
  3. For geoscience and interdisciplinary respondents there is not strong perceived support from employers or colleagues for bridging across fields and disciplines, or open sharing and reuse of data. In contrast, those whose primary identity is eResearch do experience such support. Interestingly, the lowest perceived support is among those with the most experience.
  4. The current state of geoscience eResearch infrastructure is not seen as sufficient to ensure effective data preservation. Confidence around eResearch concepts is low, except for respondents that identified as eResearch professionals. All agreed on the importance of improving mechanisms for credit and that tenure/promotion policies are a substantial barrier to creating an open data environment.
  5. Sharing data on physical samples is seen as important by all, and very important by eResearch professionals, however it is perceived as being hard to do. The actual sharing of physical samples is not seen as hard to do however.
  6. Geoscientists and interdisciplinary scholars do not see leaders clarifying common directions and aligning efforts in sharing data, models, and software; eResearch professions do see that leadership and do see this as aligned with their work.
  7. There are challenges around cooperation and open sharing of data within the Geosciences, within eResearch, and between the two. The challenges are even greater when it comes to end-user knowledge and training around accessing open data ecosystems.



Brent is the Director of the John de Laeter Centre (JdLC), a Curtin-based research infrastructure hub operating $33M of research grade analytical facilities which employs 25 staff that supports research, education and training in the minerals, petroleum and environmental sectors.

Research ID:


Creating an Open FAIR-way to Connect Researchers, Publishers and Data Repositories: a New AGU-led Initiative in the Earth and Space Sciences.

Shelley Stall1, Lesley Wyborn2, Erin Robinson3, Brooks Hanson4, Kerstin Lehnert5, Mark Parsons6, Joel Cutcher-Gershenfeld7, Brian Nosek8

1American Geophysical Union, Washington, USA,

2National Computational Research Infrastructure, Canberra, Australia,

3 Earth Science Information Partnership, Colorado, USA

4American Geophysical Union, Washington, USA,

5Lamont-Doherty Earth Observatory of Columbia University, New York, USA,

6Rensselaer Polytechnic Institute, University of Colorado, Boulder, USA

7Heller School for Social Policy and Management, Brandeis University, Waltham, USA

8Center for Open Science, University of Virginia, Charlottesville, USA,



Open, accessible, and high-quality data and related data products and software are critical to the integrity of published research: they are key to ensure transparency of research and to support reproducibility and repeatability. Unfortunately not all research artifacts are saved in such a way that they can firstly be understood by other researchers reading the publication, then subsequently be reused and repurposed in multiple other research endeavors.

To accelerate this process, the American Geophysical Union and a set of partners representing the International Earth and space Science community including the Coalition for Publishing Data in Earth and Space Sciences (COPDESS), the Earth Science Information Partnership (ESIP), DataCite, Research Data Alliance (RDA), and the Center for Open Science (COS) have been awarded a grant from the Laura and John Arnold Foundation to develop a collaborative solution across researchers, journals and repositories that will evolve the Earth and Space Science (ESS) publication process to include not just the publication, but all research inputs into that publication and related derived data products to help develop a unified process that is efficient and standardised for researchers and supports their work from grant application through to publishing [1].

The aim of the project is to develop and implement a collaborative solution for researchers, journals and repositories that will connect publications in the Earth and space sciences with related data, samples and software in repositories, and then make these connections and data interoperable and discoverable across multiple publishers and repositories. A reference set of best practices will be developed for researchers, publishers, and repositories that will include: metadata and identifier standards; data services; common taxonomies; landing pages at repositories to expose the metadata and standard repository information; standard data citation; and standard integration into editorial peer review workflows.

The solution will include defining and managing the metadata requirements and storage requirements for data and derived products, and the incorporation of the changes needed into the submission and workflows for each publisher.  It will also provide support and oversight of the adoption process, best practices, and continued compliance of the requirements by both repositories and publishers ensuring a sustainable, scalable solution.

The project will be based around the FAIR guidelines as developed by [2], which seeks to ensure that research artifacts that are input to and/or support the publication process will be Findable, Accessible, Interoperable, and Reusable (FAIR). Research artefacts can include datasets, images, video, software, scripts, models, physical samples, and other tools and technology: all are an integral part of modern day research and hence by providing persistent identifiers for each and then being able to link their IDs to publications they provide the supporting evidence, reproducibility and integrity of the scientific record.

This project will build on existing work of COPDESS [3], ESIP [4], RDA [5], the scientific journals, and domain repositories to ensure that well documented data, preserved in a repository with community agreed-upon metadata and data standards, and through supporting persistent identifiers becomes part of the expected research products submitted in support of each publication.  The solution will also ensure that the submission of data and derived products supporting research have documentation that is machine readable and better meets the FAIR Data objectives.

In Australia, this initiative was supported by AuScope [6], the Australian National Data Service (ANDS) [7] and National Computational Infrastructure (NCI) [8]. The first meeting of the Advisory Board will be in Washington D.C.  on 15 November 2017 and will be followed by a 2-day Stakeholder Workshop that will bring together repositories and journals/publishers for a workshop on implementing standards and best practices.



  1. American Geophysical Union Coalition Receives Grant to Advance Open and FAIR Data Standards in the Earth and Space Sciences. Available from Accessed 30 August 2017.
  2. The Force 11 FAIR data principles. Available from ht, accessed 30 August 2017.
  3. Coalition for Publishing Data in Earth and Space Sciences (COPDESS). Available from , accessed 30 August 2017.
  4. Earth Science Information Partnership (ESIP). Available from , accessed on 30 August, 2017.
  5. Research Data Alliance (RDA). Available from , accessed on 30 August, 2017.
  6. Australian National Data Service (ANDS). Available from , accessed on 30 August, 2017.
  7. AuScope. Available from , accessed on 30 August, 2017.
  8. National Computational Infrastructure. Available from , accessed on 30 August, 2017.


Lesley Wyborn is a geochemist by training and worked for BMR/AGSO/GA for 42 years in a variety of geoscience and geoinformatics positions. In 2014 she joined the ANU and currently has a joint adjunct fellowship with National Computational Infrastructure and the Research School of Earth Sciences. She has been involved in many NCRIS funded eResearch projects over the years. She is Deputy Chair of the Australian Academy of Science ‘Data for Science Committee’ and is co-chair of several RDA Interest Groups as well as a member of the AGU Earth and Space Science Executive Committee.


CODATA Commission on Standards

Simon J D Cox1, Lesley Wyborn2, Marshall Ma3, Simon Hodson4, Geoffrey Boulton4

1CSIRO, Melbourne, VIC Australia,

2Australian National University, Canberra, ACT Australia,

3University of Idaho, Moscow, Id, USA,

4CODATA, Paris, France,|



CODATA, the Committee on Data for Science and Technology, was established in 1966 by ICSU to promote and encourage, on a world-wide basis, the compilation, evaluation and dissemination of reliable numerical data of importance to all fields of science and technology. CODATA has played a particular role in standardizing the values of some of the key physical constants – see

CODATA is concerned with all types of data resulting from experimental measurements, observations and calculations in every field of science and technology, including the physical sciences, biology, geology, astronomy, engineering, environmental science, ecology and others. Particular emphasis is given to data management problems common to different disciplines and to data used outside the field in which they were generated.

Researchers across the science disciplines, the humanities, the social sciences need to create integrated data platforms that interoperate across discipline boundaries, and enable access to data by a diversity of users. The use of shared models and vocabularies makes data more easily re-useable, and thus more valuable.

The current landscape sees a variety of approaches to promulgating and maintaining community data models, formats, and vocabularies. These are generally organized within disciplines or groups of disciplines, with limited interoperability and linking between them. The emergence of the linked data paradigm, building on the key technologies of the World Wide Web, provides an opportunity to harmonize both tools and key content. The CODATA Commission on Standards aims to assist the science community to develop a coordinated approach, sharing best practices, and where necessary providing a platform for publication and governance of key cross-disciplinary ontologies and vocabularies.


Simon Cox is a CSIRO research scientist, who has been working on standards related to environmental information since the dawn of the web era, through the Dublin Core Metadata Initiative, Open Geospatial Consortium, ISO/TC 211, INSPIRE, Research Data Alliance, Australian Government Linked Data Working Group and W3C. He was awarded the 2006 OGC Gardels Medal and presented the 2013 AGU Leptoukh Lecture.

Developing a culture which values research data through integrated skills training

Dr Mark Hooper1, Sharron Stapleton1, Katya Henry1, Stephanie Bradbury1

1Queensland University of Technology (QUT), Kelvin Grove, Australia,,



“Research Data” is to be the third in a series of research training events developed by QUT Library and the Office of Research Ethics and Integrity. Its development follows the successful format of previous courses, “Authorship and Publication” and “Journal Peer Review”. It will be a two-and-a-half hour blended learning course, comprised of lightning talks, animations, interviews, and activities, structured around the research data lifecycle.

The poster shares our progress in developing this novel training, and promoting a culture of strong research data management practices at QUT in the context of a new Research Data Management Strategy. This strategy is an institutional response to federal government research agendas, reviews, initiatives and supporting roadmaps [1], [2], [3] [4]. Research data management continues to be part of the multifaceted and changing landscape of eResearch, and we believe it is important that institutions share learnings that contribute towards best practice.


The aim of our forthcoming course “Research Data” is to provide Higher Degree Research (HDR) students and Early Career Researchers (ECR) with a conceptual framework for understanding the complex world of research data management. It will connect researchers with tools, skills, resources, local peer-to-peer support networks, experts and opportunities for further training.

Participants will be invited to adopt a broad view of research data covering the whole research lifecycle, and then to dive in and out of more specific topics – connecting with resources that provide more information, and tools that may be useful for their specific research activities. In this sense, the overall course structure will follow the format of our previous courses that aimed to unite many individual topics into a coherent schema. For example, “Authorship and Publication” represented the relationships between topics as parts of a subway map (see Figure 1)[5]. “Journal Peer Review” represented individual topics as parts of a great industrial machine comprised of various components, cobbled together over time as illustrated in Figure 1 [5]. In this same way, “Research Data” will give participants a feel for how individual topics and discipline differences are part of a system supporting their research.

Figure 1: QUT Library and Office of Research Ethics & Integrity research training formats

These formats have proved popular, as evidenced by our anonymous feedback surveys:

“The presentation map …will be extremely useful for planning. Already have a space on the wall, as it is such a good visual reminder”

“Will certainly recommend it to others, as it gives a great ‘bird’s eye view’ of the whole process.”

“A well organised, succinct morning. The format was great – moved along well and didn’t get bogged down… All speakers were well prepared and their slides were clear and concise.”

“This was a great session. I learnt more about the publishing process this morning than I have in [my] whole time at [university]. I will be recommending [this] session to all early career academics.”

Following the two previous courses, “Research Data” will aim to integrate research skills with good research practices. In other words, it aims to integrate the “how” and the “why”. For example, the course will be based around the F.A.I.R. principles for research outputs: Findable, Accessible, Interoperable, Reusable [6].  But rather than merely explaining these principles in abstract, the course aims to equip researchers with tools that will help them to enact those principles in their various research activities.   Our poster shares some of our working ideas in this respect.


  1. Australian Government., National innovation and science agenda. 2015. Available from:, accessed 31 Aug 2017.
  2. Australian Government, Productivity Commission. Data availability and use, draft report. 2016. Available from:, accessed 31 Aug 2017.
  3. Australian Government, Department of Education and Training. 2016 National research infrastructure roadmap. Available from:, accessed 31 August 2017.
  4. McGagh, J., Marsh, H., Western, M., Thomas, P., Hastings, A., Mihailova, M., and Wenham, M. (ACOLA). 2016. Review of Australia’s Research Training System. Report for the Australian Council of Learned Academies. Available from:, accessed 31 Aug 2017.
  5. Queensland University of Technology. Authorship, publication, and peer review. Available from:, accessed 31 Aug 2017.
  6. FORCE11. Guiding principles for findable, accessible, interoperable and re-usable data publishing version B1.0. Available from, accessed 31 Aug 2017.




Mark Hooper is Education and Cultural Change Coordinator for the Office of Research Integrity at QUT.  He has designed and delivered educational materials and curricula across the academic, professional, government, and industry sectors. His PhD is in the field of philosophy and examined David Hume’s account of cognitive error. 

Sharron Stapleton has over twenty years’ experience in information research and management in corporate, academic and government sectors.  She is currently Research Data Librarian at QUT and supports researchers in managing and publishing their data.

Katya Henry is the Research Support Librarian at QUT Library.  Passionate about the Library and Information Science profession, Katya has experience in academic, school and State libraries, together with tertiary teaching and research roles.

Stephanie Bradbury is the Research Support Manager at QUT Library.  She coordinates a range of activities that support QUT’s research community including: the library’s researcher skills training workshops; research impact reporting, and data management service and scholarly publishing strategies. In the past 20 years, Stephanie has worked in various areas of QUT including Institute of Health and Biomedical Innovation (IHBI) as Information Manager, and the Research Students Centre as Research Training Coordinator.

MDbox: a cloud-based repository and analysis toolkit for molecular dynamics simulations

Karmen Condic-Jurkic1, Mark Gregson2, Steven De Costa2

1The Australian National University, Canberra, Australia,
2Link Digital, Canberra, Australia,,


Computational modelling has become an integral tool in almost every branch of science, including chemistry and biology. Computational chemistry methods are now widely used to provide better understanding of molecular processes at the atomistic level, complementing experimental findings. Molecular dynamics (MD) simulations are a powerful technique used to study molecular structure and function by following the movement of atoms over a period of time by solving classical equations of motion. MD simulations are computationally demanding and time consuming calculations, often requiring supercomputer access and significant scientific input. Unfortunately, the primary data (trajectories) generated in the process are rarely made publicly available beyond the analysis presented in publications and supporting information, remaining locally stored on hard drives or private servers without public access. Considering the human and computational resources used to generate these trajectories, they present a very valuable asset in molecular studies, especially biomolecular and materials sciences. Currently, there are general repositories that allow hosting of research data, like Figshare, Open Science Framework (OSF) or Zenodo, but to the best of our knowledge, there is no publicly available repository dedicated exclusively to hosting and managing data generated by MD simulations. There are many benefits of having a specialised and centralised repository, including standardized data description and access to large scale data analysis.

MDbox is envisioned as a specialised open access repository for MD simulation datasets. MDbox aims to provide a platform for sharing trajectories and their corresponding input files, which should improve documentation of commonly used protocols and enhance the replicability and reproducibility of simulations [1]. EMDbox can be used for research data management and serve as a long-term storage solution for users. It will make collaboration and data exchange easier, and provide an alternative for making research publicly available and citable. A well-designed metadata schema [2,3] will lead to a better discoverability and HDF5 file format [4] can be used to store all the relevant simulation data in a single file, further simplifying data search and analysis. We are currently developing a prototype of the repository and are looking to engage the community and work in collaboration with potential users to help us shape the future development of this platform.

In our information-driven era, the open data approach is of great value for further development of computational modelling and for cross-disciplinary researchers in both academia and industry. The growing movement to open up data produced by publicly funded research provides additional incentive. However, the most exciting prospects for MDbox comes in the form of new research opportunities and the advancement of molecular modelling, ranging from developing new analytics tools for large datasets to machine learning techniques. Artificial intelligence is already spreading rapidly and provides exciting opportunities in almost every area of human activity and it is expected that it will have a major impact in medicine, drug design, protein engineering and creation of new materials. These methods will require large, curated datasets to produce informative and valuable results – MDbox will provide exactly this.


1. Hinsen, K., A data and code model for reproducible research and executable papers. Procedia Comput. Sci., 2011. 4, p. 579-588.
2. Hinsen, K., MOSAIC: A data model and file formats for molecular simulations. J. Chem. Inf. Model. 2014. 54(1), p. 131-137.
3. Thibault, J.C., Facelli, J.C., and Cheatham III, T.E., iBIOMES: managing and sharing biomolecular simulation data in a distributed environment. J. Chem. Inf. Model., 2013. 53(3), p. 726-736.
4. de Buyl, P., Colberg, P.H., Höfling, F., H5MD: a structured, efficient, and portable file format for molecular data. Comput. Phyis. Commun. 2014. 186(6), p. 1546-1553.


Karmen Condic-Jurkic earned her Masters degree in chemistry at University of Zagreb, Croatia in 2006. After that, she worked at Rudjer Boskovic Institute in Zagreb as a research assistant, followed by PhD in computational chemistry and biophysics awarded by Friderich-Alexander University (Erlangen, Germany) in 2013. The same year she joined The University of Queensland in Brisbane as a postdoctoral researcher, staying there until Oct 2015, when she moved to the Australian National University in Canberra.

Her research during PhD was mostly oriented toward molecular modelling of radical enzymes and their mechanisms using various computational methods, including quantum mechanics (QM) methods, molecular dynamics (MD) simulations and hybrid QM/MM techniques. The postdoctoral research has been mostly focused on structure and function of membrane proteins implicated in multidrug resistance using classical MD simulations as primary tool.

Journey without maps: Publishing linked open data for large archival collections

Mr Owen Oneill1, Mr Daniel Wilksch1

1Public Record Office Victoria, Melbourne, Australia,



In June 2017, Public Record Office Victoria (PROV) implemented a new repository for publishing linked open data about the open access records in our collection. This new infrastructure has enabled us to experiment with semantic web based approaches such as the Linked Data Platform (LDP) platform for describing the contextual information and resources related to records in the collection in way that maintains its semantic structure while being machine readable.


Public Record Office Victoria is in the process of implementing a range of new digital infrastructure components for facilitating use of the collection. A key part of this infrastructure is a repository using Fedora Commons for storing open access content and other contextual information about the collection. This data falls into four broad categories: extended description, transcription, user generated content and renditions (copies) of records.

The content stored in our Fedora Commons repository is structured using RDF using a simple data model based on the Portland Common Data Model. This data model enables us to describe the content and relationships between content entities, and to publish the data in a form that enables it to be more easily utilised, interrogated and repurposed.

While meeting our initial requirements, we expect to confirm and refine the data model over time, as the amount of data ingested into the repository grows. We are also exploring the appropriate use of widely implemented ontologies. As an archive, there is a potential tension between using ontologies to facilitate interoperability, while avoiding overlaying semantic meaning on the content we have custodianship over. This potential tension would benefit from further consideration and discussion in the sector as the use of semantic web approaches becomes more common.



Owen ONeill is the Program Manager for Public Record Office Victoria’s Digital Archive Program. The aim of the program is to replace the systems Public Record Office Victoria (PROV) uses for maintaining and facilitating access to the PROV collection.

Owen has previously been involved in a number of digital preservation and data management projects in Higher Education and Research.

Researcher engagement and data inventory with the Data Lighthouse

Dr Elena Zudilova-Seinstra1

1Elsevier RDM Solutions, Amsterdam, The Netherlands,


When research data is openly available, the pace of scientific discovery is increased: researchers can verify findings, reproduce experiments, or reuse data to generate new findings. Many universities are actively working on adoption of the best data sharing practices to facilitate collaboration, optimize the use of their resources and acquire more funding. To achieve this, both infrastructure and engagement are equally important. University libraries are leading these efforts and recommend data sharing tools to their researchers. However, the researcher engagement still remains a challenge. As a result, adoption rates of in-house RDM services remain relatively low, while adoption of the open data repositories is mostly unknown.

To address the aforementioned problems, we have developed a new service to support reporting and communications regarding research data between librarians and researchers, called the Data Lighthouse [1]. The Data Lighthouse enables university libraries to communicate effectively with researchers in order to facilitate compliance with the best RDM practices, increase adoption rates of the data sharing tools available at the university and collect metadata around research datasets stored elsewhere.

The article publication event serves as a main trigger for the service to start. We reach out to researchers by email to:

  1. Check if any data sharing solution has been associated with this article already;
  • If not, recommend to store, share, link and publish relevant research data via open data repositories;
  • Monitor the progress and generate dashboards for librarians to assess the RDM compliance of their universities.

The Data Lighthouse is a new eResearch capability for keeping the track of research data and engaging with researchers in a proactive and adaptive manner. In this lightning talk, we will explain the Data Lighthouse functionality and present our main findings from the ongoing joint pilots with several research institutes.



  1. Zudilova-Seinstra, E., and de Waard, A., DATA LIGHTHOUSE: a service for Research Institutes to proactively engage with their researchers in the RDM space. RDAP 2017. Available from: , accessed 10 August 2017.


Dr. Elena Zudilova-Seinstra is a Senior Product Manager for Research Data at Elsevier. In her current role, she focuses on delivering tools that help researchers to share and reuse research data.  Before joining Elsevier, Elena worked at the University of Amsterdam, SARA Computing and Networking Services and Corning Inc. Elena holds an MSc degree in Technical Engineering and a PhD degree in Computer Science from the St. Petersburg State Technical University. She co-authored more than 60 research articles and book chapters.

Recommender System Meets Open Data

Dr Anusuriya Devaraju1

1CSIRO, Kensington, Australia,


The adoption of open data in universities, research institutions and government agencies has led to a dramatic increase in the number of open data on the Web. Users face the challenge of discovering relevant datasets as a result of the data proliferation. Existing data repositories address this challenge through keyword and faceted search. However, these search mechanisms are primarily intended for users who know what they are looking for or are familiar with the structure of the repositories. In addition, they may return too broad and too narrow search results. This makes it difficult for users to filter datasets that are not of their interest. Recommender systems are complementary to the search mechanisms. They have been widely employed in E-commerce sites to improve product discovery and to enhance user experience of the sites. They are information filtering systems that present users with product recommendations that match the users’ preferences or contexts.

We developed a recommendation approach for a new application area, open data discovery. The approach leverage s content-based filtering (CBF) and item-to-item co-occurrence (I2I), tuned to a feature weighting model obtained through a user survey. CBF quantifies the similarity of datasets by comparing their metadata, e.g., title, keyword, and location, while I2I considers their statistical co-occurrence, e.g., downloads by the same users. We applied the approach in the context of the CSIRO Data Access Portal, and evaluated it through a user study. 113 data users participated in the study and evaluated 216 target datasets. We identified 5 data recommendations for each of the target datasets, such that we obtained 1080 relevance judgments in total. The results of the user study reveal the ability of the recommendation approach to accurately quantify the relevance of the datasets, which we consider as an important contribution to the challenge of discovering relevant open datasets.


Description of why it is relevant to this year’s conference  .

This talk is relevant to the conference as it addresses the challenge of discovering open datasets. It presents a concrete experience in a new application area, e.g., the development of a recommendation approach to improve the discovery of open datasets.



Anusuriya Devaraju is currently a postdoctoral fellow at CSIRO Mineral Resources. Prior joining the research center, she worked as a researcher at the Institute for Bio- and Geosciences, Forschungszentrum Juelich, and involved in the data management of TERENO and TERENO-MED long-term terrestrial observatories.  Her research focuses on the discovery of research assets such as datasets, software packages and physical collections in Earth and Environmental Science using recommender system, persistent identifier and semantic technologies.


Recent Comments


    About the conference

    eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

    Conference Managers

    Please contact the team at Conference Design with any questions regarding the conference.

    © 2018 - 2020 Conference Design Pty Ltd