A report on two years work on a standards-based architecture for a Data Commons
Peter Sefton1, Simon Musgrave1 1University Of Queensland, St Lucia, Queensland, Australia
Abstract
The Language Data Commons of Australia (LDaCA) is a national infrastructure project focused on language research data preservation and use. Its goal is to secure nationally significant collections of written, spoken, multi-modal, and signed language materials and make them accessible to communities and researchers. LDaCA is developing standards, technologies, tools, and systems to catalogue, index, and publish data. The project follows FAIR principles, making data findable, accessible, interoperable, and reusable, with appropriate access controls, and CARE principles,with a focus on Indigenous data governance. LDaCA provides support and guidance for managing language data, including advice on data formats for longevity. It also offers user-friendly tools for creating and reformatting metadata, training in working with data and metadata, and access protocols for data discovery and accessibility including a data API that is integrated with Binderhub for on-demand research notebooks. LDaCA’s technical architecture principles prioritize data integrity and long-term accessibility taking a standards-based approach, using the Research Object Crate (RO-Crate) metadata framework for packaging and describing objects and the Oxford Common File Layout (OCFL) as a structured and robust storage format for those objects. This approach avoids data lock-in and ensures data longevity by decoupling archival storage and distribution from specific software. This work also has potential applications in other disciplines and domains as the technical and metadata standards are domain-agnostic and easily adaptable.
Biography
Peter Sefton is an eResearch expert, specialising in software development, Research Data Management and metadata, currently leading the technology and infrastructure team for the Language Research Data Commons project at the University of Queensland.