h5pydantic, from a Synchrotron through Python to HDF5
Clinton Roy1 1Ansto, Clayton, Victoria, Australia
Abstract
HDF5 (Hierarchical Data Format, version 5) files are structured binary files, designed to store science and mathematical data. HDF5 files are very flexible, allowing any internal layout, which is a technical strength of the format. This flexibility poses a challenge to scientists however, who need a well defined, well documented HDF5 layout for their discipline, in order to publish their work, share their results, and crucially, be able to grow a software ecosystem.
The typical software engineering approach to using a storage format is to model the area using classes, and then use a library to semi-automatically transform these classes into the storage format, the most well known family of these libraries being called Object Relational Mappers (ORM).
In the Python ecosystem, Pydantic is a very popular modelling library. h5pydantic builds on top of Pydantic. h5pydantic lets scientists map from Python models to HDF5 files. h5pydantic does not hide the underlying Pydantic library, nor the h5py library. h5pydantic helps scientists document their HDF5 format, even outside the Python ecosystem.
Biography
Clinton is an Open Source software engineer, with a career based around helping researchers of diverse fields advance their work. Clinton helps organize Open Source conferences in Australia through Linux Australia.