FAIR Health Data – Transformation Reliability Model

Dr Esmond Urwin1,2, Mr Andy Rae1,2, Dr Grazziela Figueredo1,2, Professor Phil Quinlan1,2

1University Of Nottingham, Nottingham, United Kingdom, 2Health Data Research UK, London, United Kingdom

Biography:

Esmond studied his undergraduate degree (Manufacturing Engineering (BEng. (Hons) whilst at the University of Hertfordshire and then spent time as a production manager and trouble shooter (à la Sir John Harvey Jones) for a multinational food company working across the UK and Europe. However, a change in direction was desired and he furthered his studies undertaking postgraduate degrees at the University of Nottingham (Manufacturing Engineering & Management (MSc.), Knowledge Engineering (PhD). After a post-doctoral position at Nottingham, he moved to Loughborough University where his academic career focused upon knowledge management, informatics, interoperability and ontology design predominantly for the defence and aerospace industries. This focus has drawn him to standardisation activities for which co-wrote the international standard ISO 20534 for Industrial automation systems and integration.

Moving back to the University of Nottingham in 2020, Esmond was part of the large CO-CONNECT project during the COVID-19 pandemic. His work focussed upon the application and implementation of OHDSI’s OMOP common data model to structure, represent and standardise disparate national COVID-19 health datasets for discoverability. Additionally, he developed a national COVID-19 serology laboratory minimum data standard in conjunction with the National Pathology Exchange (Napes) and the NHS for better reporting of granular levels of COVID-19 serology data nationally. To further support healthcare terminology standardisation and data representation, he developed in conjunction with the University of Dundee the CO-CONNECT controlled healthcare vocabulary which contains concepts that represent serology, medical conditions, medical observations, ethnicity, laboratory systems and specimens. Further work with BBMRI-ERIC across Europe focused upon the development of the update to the Minimum Information About BIobank data Sharing (MIABIS), from version 2 to version 3. The CO-CONNECT vocabulary specimen concepts directly support MIABIS.

Esmond’s current position for the NIHR Nottingham Biomedical Research Centre at the University of Nottingham focuses upon healthcare data standardisation, the further development and better use of OMOP and vocabulary creation for FAIR data purposes. He currently collaborates with Health Data Research UK, ELIXIR, OHDSI, UK BioBank, National Research Data Infrastructure for Personal Health, UK Longitudinal Linkage Collaboration and Data and Our Future Health.

Abstract:

Introduction

Data is being created at an ever-greater rate. Additionally, it is recognised that health data is often underutilised. Common Data Models (CDM) such as Observational Medical Outcomes Partnership (OMOP) enable Findable, Accessible, Interoperable and Reusable (FAIR) data. When transforming data to OMOP, many factors induce variability, from different systems and software to perspectives and expertise. Yet, there are no formal approaches to represent the context and provenance of OMOP health datasets throughout their lifecycle.

Methods

Data was collected from a workshop entitled ‘How to be FAIR with Data Standards’, composed of 55 practitioners and healthcare professionals representing the United Kingdom and Europe. The collected qualitative data was analysed using grounded theory coding methods.

Results

A data context model comprised of seventeen concepts has been created. These represent the three key facets of Standards, Quality and Provenance. These are supported by Measurement (composed of Metrics, Accuracy and Completeness), People, Decisions, OMOP Rule Sets, Sharing, Transformation, Format, Datasets and Bias. Context influences six of these concepts.

Conclusion

A preliminary data context model has been created representing the amorphous aspects of context. This has a dual purpose, it can support the transformation of data to CDMs, whilst aiding comprehension of how data has been transformed throughout its lifecycle. The model aims to reduce dataset ambiguity and variability at source and once transformed into a CDM.

Further work will focus on, the development and validation of the model using domain experts; an adapted version of the workshop at the ELIXIR All-Hands 2025 meeting.

 

 

Categories