Curation before creation: guided sampling for greener high-performance computing

Dr. Amanda Parker1, Ms. Chloe Lin1

1ANU, Canberra, Australia

Biography:

Commencing in 2021, Dr. Amanda J. Parker is a Research Fellow in the School of Computing at The Australian National University. Dr. Parker develops machine learning and artificial intelligence methods for the challenges of domain specific scientific data. She conducts research in guided sampling, active learning and green machine learning and is the lead CI of a Humanising Machine Intelligence Computing for Social Good 2023 seed grant. For the preceding three years Dr. Parker was a CSIRO Early Research Career Fellow in the Machine Learning Group and graduated with a PhD in physics from The University of British Columbia in 2018 specialising in statistical physics, probability, soft matter, and computational science. In 2011 Dr. Parker was a Distinguished Visiting Fellow at IBM research Almaden and her prior studies were completed at Victoria University of Wellington.

Abstract:

Materials discovery and characterisation was revolutionised by computational science. The cost savings, improved efficiencies, and deeper scientific insights gained from computer simulations have supported pioneering developments in solar cells, computer chips, recyclable materials, sensing technologies, catalysts, and drug discovery. The global impact of these developments is indisputable. But, when evaluating the societal benefits of computational science, there is still a fly in the ointment: the direct monetary and environmental costs of high-performance computing (HPC).

Green computing efforts typically focus on optimized computer architecture and software development. However. in this work we assume those environmental and general costs of HPC are fixed and aim to maximize how those resources by benchmarking guided sampling and active learning methods to direct which computational experiments are conducted.

To achieve this, we used existing publicly available data sets and where appropriate benchmark improved model performance against the cost to acquire data. We undertook a meta-analysis of computational materials science and chemistry projects supported by the National Computing Merit Allocation Scheme (NCMAS) from 2019-’21 which encompassed 25 projects. We assessed data availability and provenance from 94 related papers and 39 datasets.

We compared cross-domain informed sampling methods that prioritise selection based on with random, quasi-random, diversity, uniformity, and representativeness metrics. These were implemented in an active learning environment and performance assessed with learning curves.

Curation before creation: guided sampling for greener high-performance computing

Biography:

Abstract:

Conference Host

Conference Managers

LINKS

ACKNOWLEDGEMENT OF COUNTRY