FAIR for Jupyter Notebooks – A Practical Guide
Aleem Uddin1, Muhammad Ali1, Richard Ferrers1, Robin Burgess1, Sonia Ramza1, Matthias Liffers1, Tom Honeyman1, Paula Martinez1 1Australian Research Data Commons
Abstract
Jupyter Notebooks (JNs) have been widely used in research and data science. Despite their extensive usage, there are several challenges in making JNs FAIR (Findable, Accessible, Interoperable, and Reusable). A plausible approach is to use FAIR principles to ensure the appropriate use of JNs, and to benefit both creators and users. However, FAIR principles are high-level aspirations in making research robust and can be difficult to apply in practice. To address this problem, a team of specialists in the research sector brainstormed to formulate an approach. They concluded that this problem can be addressed by creating a practical guide with an example of FAIR JN. Hence a guide is created using existing initiatives like the FAIR for Research Software (FAIR4RS) and relevant facets of publicly available FAIR best practices.
As a result, the guide makes recommendations addressing each component of FAIR. Such as the findable (F) component is addressed by making JN broadly available by storing it on version control repositories such as Github or Gitlab. While the other FAIR components can be addressed by adding appropriate licensing, containers for reproducibility and persistent identifiers (DOI). The outcome of this work is a guide specific to JNs that addresses the broad usage of JNs in research with various programming languages like (Python, R, Julia, etc.,) and have unique reproducibility requirements. Additionally, platforms like JN and Binder Hub services make JNs reproducible and user friendly. This guide will be made openly available to the research community via ARDC’s website.
Biography
Authors’ names: Dr Aleem Uddin
Affiliations, and biography: Aleem is research infrastructure specialist (virtualisation) at ARDC.
https://orcid.org/0000-0002-8519-5534
Authors’ names: Dr Muhammad Ali
Affiliations, and biography: Muhammad is Research Data Specialist (Data Architecture) at ARDC.
Authors’ names: Dr Richard Ferrers
Affiliations, and biography: Richard is Data Consultant at ARDC.
Authors’ names: Dr Robin Burgess
Affiliations, and biography: Robin is Research Data Specialist (Data Governance) at ARDC.
Authors’ names: Sonia Ramza
Affiliations, and biography: Sonia is User Support Manager (Nectar) at ARDC.
Authors’ names: Matthias Liffers
Affiliations, and biography: Matthias is Product Manager (PIDS) at ARDC.
Authors’ names: Dr Tom Honeyman
Affiliations, and biography: Tom is Program Manager (Software) at ARDC.
Authors’ names: Dr Paula Martinez
Affiliations, and biography: Paula is Project Coordinator (Software) at ARDC.