Mr Michael Mallon1, Mr Jake Carroll2, Dr David Abramson3
1RCC The University of Queensland, St Lucia, Australia, firstname.lastname@example.org
2QBI The University of Queensland, St Lucia, Australia, email@example.com
3RCC The University of Queensland, St Lucia, Australia, firstname.lastname@example.org
The increasing amount of data being collected from simulations, instruments and sensors creates challenges for existing e-Science infrastructure. In particular, it requires new ways of storing, distributing and processing data in order to cope with both the volume and velocity of the data. In addition, maintaining separate silos of storage, for different access technologies, becomes both a barrier to migration between technologies and a significant source of inefficiency in providing data services as the volume and/or velocity of data increases. The University of Queensland has recently designed and deployed MeDiCI, a data fabric that spans the metropolitan area and provides seamless access to data regardless of where it is created, manipulated and archived. MeDiCI is novel in that it exploits temporal and spatial locality to mask network limitations. This means that data only needs to reside locally in high speed storage whilst being manipulated, and it can be archived transparently in high capacity, but slower, technologies at other times. MeDiCI is built on commercially available technologies, namely, IBM’s Spectrum Scale (formerly GPFS) and HPE/SGI Data Migration Facility (DMF). In this talk I will describe these innovations, in particular, how we are able to avoid the silos of storage issue when providing access to the data in the fabric.
Caches all the way down
MeDiCI is designed on two fundamental concepts. First, locality makes it possible to store data centrally in a dedicated storage system, but cache it temporarily where it is generated or used. Caching makes it possible to create an illusion of uniform high-speed access to all platforms, when in fact there may be a mix of speeds-and-feeds between systems. Second, while there may be multiple physical copies of data for fault tolerance, this should not be confused with the ways in which users wish to access it. Instead, we keep only one logical copy of a data set, which can then be exposed by a variety of access mechanisms to suite the applications. We may store multiple copies of the data to provide fault tolerance, but only one will be regarded as the primary data instance. We implement this resilient file store on our existing DMF infrastructure.
Moving data Out of the datacenter: Active File Management
Active file management is a feature of IBM’s Spectrum Scale that allows for a remote filesystem to be cached in a GPFS filesystem. This remote (or “home”) filesystem may be either an NFS export or a GPFS remote filesystem mount. There are several modes of operation that may be employed to establish an AFM relationship between a home and cache fileset (a logical separation for a set of files in a GPFS filesystem). In the MeDiCI context, the mostly commonly used caching modes are independent-writer (where multiple caches may update files on a home share), single-writer (the cache assumes that it is “correct” and makes the home match it’s state) and read-only. There is no file locking across the AFM relationship and data transfer is asynchronous, which makes the AFM relationship more akin to eventual consistency rather than the immediate consistency. The GPFS clusters that make up the AFM relationship are still immediately consistent and fully POSIX compliant, it is just across the AFM relationship where immediate consistency is not implemented. This is particularly important for independent-writer mode where files can be updated at cache or at home. Care must be taken to ensure that files are not modified at both home and cache at the same time, otherwise, the “last” write will win.
Crossing Institutional Borders: Id Mapping
One of the great challenges with sharing data between organizations is independent identities. Over the past decade, there has been significant work in abstracting away identities for HTTPS workloads by the Australian Access Federation (AAF). While this work has been a success, there has been significant challenges in leveraging this work for POSIX identities. The id mapping feature of Spectrum Scale allows us to sidestep this issue. This feature allows for on the fly mapping of posix user, group and acl ids to and from a globally unique namespace. We have chosen AAF’s auEduPersonSharedToken attribute  as the globally unique namespace for user identities. The reason for this choice is that QCIF’s posix identities are managed via an AAF based portal and can capture the attribute at account creation time and at the same time, the issuing institution stores the attribute in either LDAP or a mysql database, linking the attribute to a local institutional credential, allowing for both sites to have easy access to the attribute. This significantly simplifies the management of the mapping scripts and leverages existing work.
One COPY, ANY way you want it
The final key in providing seamless access to data is breaking down the silos of storage barriers. The biggest barrier to making this a reality has been the significant differences between posix semantics and object/s3 semantics. One approach has been to attempt to build a posix style filestore on top of objects (eg, s3 storage gateway). While this may work for some workloads, it typically breaks down when attempting to use multiple writers. The reverse solution is to build an object gateway on top of a posix filesystem. This allows for a fully posix compliant filesystem but means that the process for mapping objects needs to be done periodically. This is usually acceptable as a s3/swift endpoint is eventually consistent and so typically, s3/swift applications need to be designed to be tolerant of delays in updates or changes to objects.
GPFS provides the second type of solution to unifying file and object workloads via the unified file and object interface for OpenStack swift. This style of object storage is deployed as a different storage policy and is backed by a custom storage backend ‘swiftonfile’. This backend encodes account and container information as a two layer directory tree and objects sit underneath this. A periodic process scans the directory tree for updates. IBM have also developed a swift middleware aimed at providing a mechanism for running swift on high latency media. This allows for data to be staged in and out of the cache, eliminating any concerns with swift HTTP timeouts while accessing data that has been evicted out of all disk caches.
At the time of writing, we have demonstrated surfacing a single logical copy of data via Native GPFS client mount into HPC clusters, NFS exports into QCIF managed services, CIFS mounts to PCs and Scientific instruments using institutional credentials, Nextcloud, S3/swift using institutional credentials, and Swift using NeCTAR keystone credentials.
- McFredries, P. “The Coming Data Deluge”, IEEE Spectrum, Feb 2011.
- IBM Spectrum Scale, https://www.ibm.com/us-en/marketplace/scale-out-file-and-object-storage
- SGI DMF, http://www.sgi.com/products/storage/tiered/dmf.html
- Active File Management, https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_afm.htm
- auEduPersonSharedToken, http://wiki.aaf.edu.au/tech-info/attributes/auedupersonsharedtoken
- Amazon Web Services, “AWS Storage Gateway”, Apr 2017. https://d0.awsstatic.com/whitepapers/Storage/aws-storage-gateway-file-gateway-for-hybrid-architectures.pdf
- Unified file and object interface, https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1ins_unifiedaccessoverview.htm
- IBM redbooks, http://www.redbooks.ibm.com/redpapers/abstracts/redp5430.html
Michael has been working for the Research Computing Centre at UQ for 5 years in various devops and support capacities. His current role in the RCC is developing and supporting the Queensland NeCTAR and RDS facilities for QCIF. He has expertise in HPC, Cloud, Storage and Networking. He holds a Bachelor of Science (Hons) in Physics and a Bachelor of Engineering (Hons) in Software Engineering, both awarded by the University of Queensland.