Ms Rubbiya Ali1, Dr Mark Endrei2, Tom Mason1, Nishanthi Dasanayaka2, Jake Carroll2, Prof. Roger Wepf1
1Centre For Microscopy And Microanalysis, The University of Queensland., St Lucia, Australia, 2Research Computing Centre, The University of Queensland, Queensland, St Lucia, Australia
Biography:
Dr. Rubbiya Ali is the Data Informatics Manager at the Centre for Microscopy and Microanalysis, The University of Queensland. With a PhD from UQ’s Institute for Molecular Bioscience, her expertise lies in computational image analysis, large-scale microscopy data workflows, and scientific data management. She has developed novel algorithms (3D BLE and RAZA) for 3D edge detection and 3D particle picking in electron tomography, and has played a key role in designing and implementing PITSCHI, a scalable, FAIR-aligned platform that enables deep integration of data capture, storage, and analysis for large-scale imaging workflows.
Abstract:
Modern high-throughput imaging technologies—such as fast CMOS detectors and high-resolution microscopy—routinely generate multi-terabyte datasets that challenge conventional data handling models. At the Centre for Microscopy and Microanalysis (CMM), The University of Queensland (UQ), over 700 researchers from academia and industry access more than 50 instruments across five facilities. For such a complex environment, robust, automated, and FAIR-compliant data infrastructure is essential.
To address this need, CMM and UQ’s Research Computing Centre (RCC) developed PITSCHI (Particle Image depoT using Storage CacHing Infrastructure)—a scalable, open-source imaging data platform built on Clowder (NCSA, USA), aligned with the Australian Characterisation Commons at Scale (funded by ARDC in collaboration with Microscopy Australia).
PITSCHI automates data capture, secure transfer, and ingestion into a searchable, metadata-rich repository. It integrates deeply with UQ’s research ecosystem—including the Research Infrastructure Management System (RIMS), Research Data Manager (RDM), and the MeDiCI high-performance storage fabric—enabling seamless data flow from acquisition to archiving.
A key innovation is the implementation of Persistent Identifiers (PIDs) at dataset, instrument, and facility levels. This supports traceability, reproducibility, and reusability of data, while enabling automated metadata enrichment and reducing manual curation overhead.
To date, PITSCHI has ingested over 635 TB across 7.8 million files, demonstrating its capacity to support long-term, institution-wide data stewardship. Architectural insights, practical lessons from PID deployment, and the design of PITSCHI highlight how a federated, FAIR-aligned infrastructure can effectively support complex research environments.