Dr Peter Sefton1, Nick Thieberger, Marco La Rosa, Simon Musgrave, River Tae Smith, Moises Sacal Bonequi
1University Of Queensland, St Lucia, Australia
This presentation discusses an ongoing standardisation effort for language data, designed to improve interoperability, reduce costs for data migration and allow storage on disk, object storage or in archival repositories.
RO-Crate is a linked-data metadata system which allows discovery metadata (Who, what where) based on the widely adopted Schema.org vocabulary to be seamlessly integrated with more discipline specific metadata. RO-Crate uses metadata profiles to provide guidance for packaging resources for particular disciplines and purposes.
In this presentation we will introduce a metadata profile of RO-Crate for language data which extends the core RO-Crate standard with new vocabulary terms adapted from pre-linked-data discipline specific metadata efforts, particularly the Open Language Archives Community (OLAC) and IMDI standards. The profile has English-language guidance on how to structure collections of resources in a repository with links between them, such that they can be indexed and displayed via APIs and search/browse portals. The profile is also implemented as a series of machine-readable profiles for the Describo Online metadata description system.
We will demonstrate describing items in a variety of languages and modes (spoken, written and signed), from a large set of heterogeneous language resources held by PARADISEC and the Language Data Commons of Australia. We will also show how to access them via API calls and a search portal, and how resources may be stored in simple storage systems using the Arkisto platform (a set of standards and principles).
Biography:
Peter Sefton is an eResearch expert, specialising in software development, Research Data Management and metadata, currently leading the technology and infrastructure team for the Language Research Data Commons project at the University of Queensland.