Multiscale Informatics: making the most of your data with machine learning

Dr. Amanda Parker1

1ANU, Canberra, Australia

Biography:

Commencing in 2021, Dr. Amanda J. Parker is a Research Fellow in the School of Computing at The Australian National University. Dr. Parker develops machine learning and artificial intelligence methods for the challenges of domain specific scientific data. She conducts research in guided sampling, active learning and green machine learning and is the lead CI of a Humanising Machine Intelligence Computing for Social Good 2023 seed grant. For the preceding three years Dr. Parker was a CSIRO Early Research Career Fellow in the Machine Learning Group and graduated with a PhD in physics from The University of British Columbia in 2018 specialising in statistical physics, probability, soft matter, and computational science. In 2011 Dr. Parker was a Distinguished Visiting Fellow at IBM research Almaden and her prior studies were completed at Victoria University of Wellington.

Abstract:

There is strong motivation to apply machine learning (ML) methods across varied disciplines. However, in applied ML, the assumptions, constraints and goals can all vary from the settings for which those methods were developed, for example: small datasets vs big data methods, prioritising model performance vs models optimised for speed, majority numeric data vs categorical approaches, domain insight vs standard normalisation approaches. These mismatches can seem a steep or insurmountable barrier to those considering whether ML methods can be applied to their research.

This workshop will address machine learning implementations and assumptions. There will be on a focus on small (100 < 100,000 instances) but high dimensional tabular data. The workshop will cover 4 sections: 1. Data preprocessing assumptions. 2. Unsupervised learning – assessing the quality of a clustering result, non-clustering metrics to give data insight (uniformity, diversity, outlier detection), and dimension reduction for visualisation. 3. Cross validation and learning curves for supervised learning 4. Active learning and guided sampling methods to direct queries/experiments.

The workshop will not cover the details of specific supervised or unsupervised learning methods (e.g. SVM, Random forests, k-means clustering). Some prior knowledge of these methods and experience with Python would be useful but is not required or assumed. Jupyter notebooks will be provided and the python packages skLearn and Pandas will be used. Ideally python will be installed locally but can be made available online if preferred.

Multiscale Informatics: making the most of your data with machine learning

Biography:

Abstract:

Conference Host

Conference Managers

LINKS

ACKNOWLEDGEMENT OF COUNTRY

Multiscale Informatics: making the most of your data with machine learning

Biography:

Abstract:

Website Sponsor

Conference Host

Conference Managers

LINKS

ACKNOWLEDGEMENT OF COUNTRY