Accelerate research and reduce costs with cloud optimised storage formats

Mr Steven Gillard1

1Amazon Web Services, Melbourne, Australia

Biography:

As an AWS Senior Technologist and Solutions Architect, Steve brings nearly three decades of expertise in designing and implementing mission-critical and large scale data systems across Telco, Internet, Retail, Government, and Higher Education sectors. Based in Melbourne, he specialises in architecting secure, resilient, and cost-optimised solutions for ANZ public sector organisations.

Steve holds AWS certifications as an AI Practitioner and Solutions Architecture Professional, is a member of the AWS Resilience Specialist Field Community, and earned his Bachelor of Applied Science in Computer Science from RMIT.

Steve is passionate about leveraging cloud technologies to accelerate scientific research in climate change and healthcare innovation. His work focuses on helping research institutions harness the power of AWS to drive breakthrough discoveries and improved outcomes.

Abstract:

As research datasets grow to petabyte scale, traditional data formats like CSV and NetCDF are becoming significant bottlenecks in cloud environments, leading to spiraling costs, slow query times and frustrated researchers. The solution lies in cloud-optimised storage formats that can deliver both dramatic performance improvements and substantial cost savings.

Modern cloud-optimised formats leverage the distributed nature and high throughput capabilities of cloud object storage systems like Amazon S3. By enabling compression, selective data access and parallelised scale-out reading and writing of data without requiring intermediate file system layers, these formats can improve performance by 10x or more while reducing storage and computing costs by 65%. For researchers, this means faster results and more time focusing on science rather than data movement.

This talk introduces cloud-optimised formats for different data types: Parquet for tabular data, Cloud Optimized GeoTIFF for imagery, and Zarr for multi-dimensional data. We'll also explore emerging open source technologies like Kerchunk, VirtualiZarr, and Icechunk that can achieve similar benefits without data conversion – a game-changer for organisations with large legacy datasets. You'll leave with practical strategies for data layout and chunking that balance performance with flexibility, plus clear guidance on choosing the right format for your specific use case.

Accelerate research and reduce costs with cloud optimised storage formats

Biography:

Abstract:

Conference Host

Conference Managers

LINKS

ACKNOWLEDGEMENT OF COUNTRY