Adding value to research data: Collaboration between APO, VIVO and Research Graph

Presenters: Ms Michelle Zwagerman1, Mr Les Kneebone1

Authors: Amanda Lawrence1, Camilo Jorquera1, Peter Vats2, Michael Conlon³, Amir Aryani2

1Analysis & Policy Observatory (, Hawthorn, Australia

2Research Graph, Melbourne, Australia, {peter.vats, amir.aryani}

3University of Florida, Gainesville, Florida, USA,


Analysis & Policy Observatory (APO) is an early adopter of the Research Graph Augment API. In Feb 2018, APO has joined the Duraspace pilot project to test this new cloud-hosted API. APO has leveraged this pilot and the Research Graph Technology to augment social science and open policy data using the global network of scholarly works. In this presentation, we report on the outcome of the pilot and describe how the Augment API has added value to APO ‘s research repository by increasing the number of linked publications and datasets by 71%. Also, we will present how the Augment API has transformed the APO’s data to VIVO RDF and provided a bridge between bibliographic records and semantically enabled data infrastructures. Finally, we will talk about the future roadmap for the social science and open policy data graph — a collaborative project between APO and international partners such as GESIS.


³APO was recently ranked the 5th most important repository in Australia and 141 out of over 2,000 repositories around the world (Webometrics 2017)1. APO includes nearly 40,000 records and features the work of over 5,000 organisations and 20,000 authors. The open access database specialises in policy and practice grey literature such as commissioned reports, discussion papers, working papers, briefings, conference papers, evaluations and case studies, but also includes datasets and over 10,000 policy related journal articles.


A recent collaboration between VIVO and Research Graph [1] developed and demonstrated a repeatable process for using seed data to build first and second order graphs, and to export, transform, and load those graphs in VIVO RDF format to a hosted VIVO instance [2]. As illustrated in Figure 1, this process enriches the research repositories’ data by (1) Transforming repository data to a graph database, (2) Augmenting the graph with the Research Graph data, (3) Making this graph available as a VIVO instance. In Feb 2018, APO has joined a pilot project by Duraspace2 to leverage this technology [2].

Adding  APO   content   to  the  Research  Graph  environment resulted  in  immediate  increase  in  links  from  APO  research objects  to  publications  via  publishers  and  authors.  APO had

1,462 research objects with PID (Persistent IDs) such as ORCID, DOI and ScopusID. The PIDs play a key role in connecting APO content entities  within  the Research Graph. The Augment API was then able to predict matches with, and assign new PIDs to research objects, thereby enriching the overall stock of PIDs in APO repository.   After one iteration, links from APO content to external publications grew from 25,959 to 44,542 – a 71% increase. With the newly enhanced PID data, further iterations of APO exposure to the Research Graph resulted a snowballing of connections  with  research  objects  in  the  graph.  Figure  2 shows the APO’s graph before and after augmentation.

Figure 1: Research Graph Augment API.


1 Cybermetrics Lab, 2017, Ranking web of world repositories: Oceania



Figure 2: APO’s graph before and after augmentation

The results of the Augment API can be accessed in a Neo4j Graph DB. The graph data is also converted into RDF and the triples available to search and browse interfaces including the OpenVIVO interface.

In this talk, we will discuss the augmentation process and the lessons learnt from the pilot. In addition, we will present the APO’s graph visualisation and describe the changes appeared in the graph as the result of linkage to external data sources such as ORCID, Scholix, DataCite, and Crossref.


With a deluge of unstructured documents and diverse data to sift and analyse, researchers working on multidisciplinary public policy issues urgently need new digital research methods and integrated data solutions if they are to provide the evidence needed  to  have  an  impact on policy decisions and practices. By augmenting data we enable linkage of previously disconnected information. This pilot has demonstrated the possibilities. The next step is to integrate the open  policy  data  graph  within  APO’s  existing  service  offerings.  Furthermore,  APO is exploring the possibility of expanding it’s repository of social science and open policy documents by transforming the existing work to a larger graph that includes data from international partners such as GESIS in Germany and British Library. This engagement is part of a new  Research Graph collaborative project to build a domain-specific graph for social science research. As part of this presentation, we will provide further updates on this project.


[1]   A. Aryani, M. Poblet, K. Unsworth, J. Wang, B. Evans, A. Devaraju, B. Hausstein, P. Klas, B.Zapilko, S. Kaplun, “A Research Graph dataset for connecting research data repositories using RD-Switchboard”, Nature Scientific Data, Volume 5, Pages 180099, 2018,

[2] M. Conlon, A. Aryani. “Creating an Open Linked Data Model for Research Graph Using VIVO Ontology,” July 24, 2017.


Michelle Zwagerman is the Digital Product Manager for Swinburne’s and the CRC for Low Carbon Living’s Knowledge Hub. She has completed a Master of Public Policy at RMIT, a Master of Business Administration at University of NSW, and a Bachelor of Science at University of Melbourne. She has over 20 years’ experience in Information Technology having delivered numerous IT projects and managed various IT support services.

Les Kneebone has worked in information management roles in government, school, community and research sectors since 2002. He mainly contributed to managing metadata, taxonomies and cataloging standards used in these sectors. Les is currently supporting the Analysis & Policy Observatory by developing and refining metadata standards and services that will help to link policy literature with datasets.

Recent Comments