data movement as a service — “PO Boxes for data with managed couriering inbetween”
Chris Myers3, Mr Guido Aben1, Mrs Crystal Chua2
1AARNet, Perth, Australia
2AARNet, Brisbane, Australia
this presentation points at a tenacious challenge in bulk data movement that has persisted despite decades of relentless network speed gains. We then propose a way forward.
The challenge is this:
- Only researchers who are closely inducted to the eInfrastructure ecosystem are reliably able to move bulk data quickly. All other researchers are at risk of encountering problems of tooling/training/infra nature. These problems are opaque, not stable in time, and require relationship building to get fixed. In real terms, this means it is simply too hard for “the average individual researcher” to move meaningful amounts of data from A to B
- not solving this problem will be a serious issue for advanced collaborative Science Cloud / Science Commons proposals (such as ARDC, or EOSC) which aim at serving the infrastructure needs of all public sector researchers through centrally (“cloudy”) service provisioning, while data generation (think campus-based instruments) remains a function of the network edges
- researchers need a network layer that isn’t about bits-on-the-wire, but instead speaks “addressable data objects” which can be told where to be, and when, as part of programmable science workflows.
This presentation presents recent international work towards such a new inter-NREN layer of infrastructure for data movement and orchestration. This infrastructure will allow data to be moved in terms of time, endpoints, and data objects. It will allow researchers to articulate needs in terms of what data is needed, where, and when – not in terms of network links and capacities.
Guido Aben is AARNet’s director of eInfrastructure partnerships. He holds an MSc in physics from Utrecht University. In his current role at AARNet, he is responsible for building partnerships and fostering collaborations between like-minded entities in international eScience and eInfrastructure, all in order to build services to researchers’ requirements. His current prime interest is in federated large data collaboration systems, but he’s always on the lookout for new diamonds in the rough.