Enhancing the Accessibility of ORCID Public Data, now additionally hosted on Google BigQuery

Mr Simon Porter1, Michele Pasin1, Jared Watts1, Helene Draux1, Julie Petro2, Tom Demeranville2

1Digital Science, London, United Kingdom, 2ORCID,

Biography:

Simon Porter is VP of Research Futures at Digital Science, and is a current ORCID board member. He has forged a career transforming university practices in how data about research is used, both from administrative and eResearch perspectives. As well as making key contributions to research information visualization, he is well known for his advocacy of Research Profiling Systems and their capability to create new opportunities for researchers.

Abstract:

Background

ORCID is committed to openness, exemplified by the annual release of its Public Data File since 2012. This dataset, encompassing all public ORCID records, has been downloaded over 190,000 times and serves as a resource for analyzing research community dynamics, scientific migrations, collaboration networks, and ORCID adoption trends. However, the file’s substantial size poses challenges for users lacking advanced data management skills, hindering exploratory analyses.

Objective

To improve accessibility and facilitate data exploration, ORCID partnered with Digital Science to host the 2024 Public Data File on Google BigQuery, a cloud-based data analytics platform optimized for large datasets.

Methods

By leveraging Google BigQuery, users can now perform exploratory analyses directly on the cloud without the need to download and locally process the entire dataset. This approach reduces technical barriers and enables more efficient data interaction.

Results

The beta version of this service is now available, allowing the research community to develop innovative use cases for ORCID data, such as reporting on peer review practices or linking ORCID data with other open external bigQuery datasets like those from the World Bank, or commercial datasets such as Dimensions. While the dataset remains freely accessible, users must establish their own Google BigQuery accounts. Google offers a free usage tier, with fees applicable beyond certain usage levels. Additionally, Digital Science provides sample queries to assist users in efficiently querying different parts of the ORCID dataset.

 

 

Categories