Improving Predictive Machine Learning Using Wavelet Reconstructions

Rakib Hassan1, John Wilford2

1Geoscience Australia, Canberra, Australia, rakib.hassan@ga.gov.au

2Geoscience Australia, Canberra, Australia, john.wilford@ga.gov.au

 

‘Uncover’ Machine Learning

Uncover Machine Learning is an initiative at Geoscience Australia to exploit recent advances in machine learning as a predictive analytics tool to support mineral exploration in Australia. Uncover-ML, a codebase developed in collaboration with CSIRO’s Data61, implements Bayesian regression models for supervised learning and leverages a suite of clustering and regression algorithms implemented in Scikit-Learn, a widely used, open-source library for machine learning.

The Uncover-ML codebase can be categorized logically into three sets of modules that comprise its machine learning pipeline: (1) Preprocessing, (2) Training and Prediction, (3) Output Generation. The Preprocessing modules implement a suite of algorithms for transforming, filtering and manipulating high resolution (~90 m), continental scale raster data sets representing e.g. topography, gravity, magnetics, etc. The Training and Prediction modules expose machine learning algorithms that consume raster and point data sets, also known as covariates and targets, respectively, during the training phase. The last leg of the pipeline takes a trained model and generates probabilistic predictions e.g. the likelihood of the occurrence of a mineral of interest at a given location. The pipeline is highly parallelized and is optimized for predictive modelling on large national datasets.

Self-similarity of geophysical datasets

Many landscape and geophysical datasets e.g. topography, drainage networks, magnetic intensity and earthquake epicenters exhibit fractal patterns (Turcotte 1992). Fractal patterns show the same statistical properties at many different scales.

Figure 1: Drainage networks as illustrate this Landsat TM 8 image are often used as an exemplar of fractal  .

However, machine learning algorithms are typically unable to exploit the self-similarity of input data sets at long wavelengths, such as the similarity of the branching patterns of the drainage system at different scales in Fig. 1. Targets and the corresponding covariate values used for training are point measurements/observations and invariably don’t take into account neighborhood relationships. We capture these neighborhood relationships by generating several multiscale versions of each covariate using 2D wavelet reconstructions (Kalbermatten et al. 2012). By including these multiscale versions of each raster in the input data, we enable machine learning algorithms to embed these relationships into a model during the training phase.

We use PyWavelets, an open-source python package, for decomposing and reconstructing raster data based on dyadic wavelet transforms, as shown in Fig 2. We apply the following steps to decompose and reconstruct each raster into progressively longer wavelength representations, while preserving their original pixel resolution, which is an essential requirement for the machine learning pipeline:

  • Compute 2D wavelet transform of raster
  1. Keep the low-pass filter coefficients and set the horizontal, vertical and diagonal high-pass filter coefficients to zero
  2. Compute 2D inverse wavelet transform based on the coefficients in step 2.

The above steps produce a Level-1 representation of the original raster, but with the spatial wavelength doubled. The same procedure can be applied again on the Level-1 raster to obtain a  representation of the original raster, but with the spatial wavelength quadrupled. These steps are repeated to produce successively longer wavelength versions of a given raster.

We have incorporated this multiscaling functionality into the Preprocessing module of Uncover-ML, which allows us to selectively apply it on continuous, non-categorical raster data. Preliminary prediction results  obtained by including multiscale rasters in the training phase show improvements compared to those from standard models. With further tests and parameter-tuning we expect further improvements in predictive mapping capabilities.

REFERENCES

  1. Turcotte, D. L. (1992), Fractals, chaos, self‐organized criticality and tectonics. Terra Nova, 4: 4-12. doi:10.1111/j.1365-3121.1992.tb00444.x
  2. Kalbermatten, Michael, et al. “Multiscale analysis of geomorphological and geological features in high resolution digital elevation models using the wavelet transform.” Geomorphology 138.1 (2012): 352-363.
  3. Mallat, S., 2000. Une exploration des signaux en ondelettes. Paris: Les éditions del’école polytechnique.

This paper is published with the permission of the CEO, Geoscience Australia


Biography:

Dr Hassan has worked as a computational software developer in both industry and academia since 2004. He obtained a bachelor in applied physics in 2003 at RMIT University, a master of geoscience at Macquarie University in 2009 and more recently, a PhD in computational geophysics at the University of Sydney in 2016.

About the conference

eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.

© 2017 - 2018 Conference Design Pty Ltd