What to do before your cloud infrastructure is accidentally deleted

Kenni Bawden1

1University Of Melbourne, Australia

Biography:

Kenni has been developing software at universities (University of Melbourne, University of Sydney), research organisations (NICTA, Data61) and non-profits for the last 15 years.

Kenni has worked on projects ranging from AI to monitor air pollution for the EPA to web application security training for high-schoolers across Australia to systems for managing access, administration and keeping inventory of research data and compute resources.

As a result, Kenni has developed strong skills in prototyping new, open-ended projects, maintaining and improving old and complex software systems, and training mix-skilled teams.

Abstract:

As seen in the Google-UniSuper incident in May this year, even infrastructure, software development and maintenance experts can cause accidents that result in the loss of data and production software systems. What are the practical steps software operations and developers of eResearch systems can take to ensure that they are fault resilient?

Version control, testing, reproducible builds and backups are all important tools to be aware of and using. Depending on your current systems, it can seem daunting, too costly or too much effort to implement.

However, at the Melbourne Research Cloud (MRC) we’ve found that we were able to implement these systems in our user-facing applications without slowing down development. In fact, we were able to make our development processes more efficient, reliable, useful and pleasant to use.

The key to success is setting up these recovery systems is to harness existing software libraries and services, provide immediate utility to your software development team and not be afraid of incremental progress. This will ensure you gain benefit –and thus maintain– your disaster recovery systems before they are needed during a catastrophic failure.

 

Categories