PITSCHI: Particle Imaging depoT using Storage CacHing Infrastructure
Dr Hoang Nguyen1, Dr Rubbiya Ali2, Professor David Abramson1, Professor Roger Wepf2
1Research Computing Center, The University Of Queensland, Australia
2Center for Microscopy and Microanalysis, The University of Queensland, Australia
Scientific imaging instruments with modern fast CMOS detectors are generating increasingly large datasets, and hence, data management becomes more critical. This is particularly true in the context of large multiuser facilities such as the Center for Microscopy and Microanalysis (CMM) at UQ as it operates a wide range of microscopes and many of them are big data producers. A central data repository to store, index, annotate data not only allows its researchers to search, browse and retrieve their data easily but also has the potential to harvest metadata to enrich these datasets.
We present Pitschi, a data repository for CMM scientific instruments that adheres to FAIR data principles. This is part of the Australian Characterisation Commons at Scale project, which is funded by the Australian Research Data Commons. Pitschi is based on the opensource Clowder data management framework from the National Center for Supercomputing Applications in the US, and it is fully integrated with the instrument booking system and the storage infrastructure at UQ. Pitschi provides end-to-end process data management, from capturing raw data to transferring them to storage collection and finally ingesting/indexing the data into the repository. As part of the ingest process, metadata of supported file types are extracted automatically. These metadata are then used to facilitate search and discovery. Once the data are ingested in Pitschi, they are available in various platforms such as HPCs, personal computers, and processing platforms such as CVL. Data transport is arranged transparently using the Metropolitan Data Caching Infrastructure (MeDICI).
Hoang Nguyen is a computer scientist with background in high-throughput and high-performance computing. He works at the Research Computing Centre, the University of Queensland. His work interests revolve around scientific workflows and different ways for users to interact with these workflows. Since joining the ACCS project in late 2020, he has been more involved with data management.