1The University of Queensland, Brisbane, Australia, email@example.com
The use of accelerators (GPU, ASIC, FPGA) in research computing has become more prevalent as hardware/software ecosystems have matured. To complement this, frameworks from vendors such as nVidia and AMD have become fully-featured. As a result of a series of significant ARC/NHMRC grants – an unprecedented amount of scientific imaging infrastructure is being commissioned on the University of Queensland St Lucia campus. To leverage scientific outputs and process the data that this new infrastructure would generate, UQ procured its first tranche of accelerated computing capability in late 2017. This presentation discusses the user-engagement strategy behind UQ’s accelerated computing deployment, how it worked, why it worked and why it was a novel approach in the sector.
In late 2017, after an extensive benchmarking, analysis and design process, the Wiener supercomputer was procured to enable near real time deconvolution and deskew from imaging infrastructure, such as UQ’s new Latice Light Sheet Microscope (LLSM). This platform was the first in the Asia Pacific to feature the nVidia Volta V100 GPU and only the fourth production deployment in the world. The Wiener supercomputer was the largest investment in GPU/accelerated supercomputing that the state had ever made. The initial intention of Wiener was to provide a powerful means of deconvolution  to the LLSM, but it was quickly realised that with this many GPU’s connected tightly in a dedicated supercomputing deployment, the platform would serve as UQ’s launchpad for a general accelerated computing strategy.
basis of advanced computing strategy
UQ, as with several of its contemporaries has a significant investment in supercomputing. UQ’s strategy differs somewhat from its equivalent national and sister-state facilities in that it provides different pillars of supercomputing for different workloads in dedicated infrastructure.
Table 1: UQ’s Supercomputing Infrastructure load-out
|Platform name||Machine domain focus||Workload characterisation||Expected user demand||Actual user demand|
|Tinaroo||Multi-discipline||MPI, tightly coupled shared memory, massively parallel||High||High|
|Awoonga||Multi-discipline||Loosely coupled, MPI-slack, high latency, cloud-like.||Medium||Medium|
|FlashLite||Multi-discipline||High throughput, high memory||High||Low|
|Wiener||Multi-discipline||GPU, ML, DL, CNN and imaging specific.||Low||High|
UQ misjudged the user demand for both FlashLite and Wiener, but for different reasons, which strategic discussion in this presentation will explain and articulate.
Fostering an accelerated computing community
In the initial, as can be seen in Table 1, UQ made some assumptions about where it thought the most user demand would be, which proved incorrect. This lead to initial interest in Wiener being far more profound than first anticipated. UQ expected that Wiener would cater to a niche subset of imaging workloads, but what was unanticipated was the level of sophistication and understanding of application of convolutional neural networks, deep learning and machine learning techniques in the domain of imaging itself. An example was our overt expectation that deconvolution algorithms would run against the GPU infrastructure using codes such as Microvolution and SVI’s Huygens. The truth was, researchers were already considering using machine vision techniques and TensorFlow at scale to characterise and train image sets for more accurate detection of cells, cancers and viruses. 
At this point, UQ rationalised that it needed to take a more direct approach in engagement and collaboration with end users to effectively liberate the capability of this new platform. A core tenant of this was a personal and one on one approach to each workload. Whilst this is an administrative burden, it has been demonstrated that it delivers significantly better outcomes. Thus, the general ‘onboarding’ process to Wiener, from an early point of production state became the following process:
- User approaches RCC with a request for compute time on accelerator based HPC.
- A subject matter (computer science, HPC) expert will then make an appointment to meet with the researcher or research group in order to better understand the science.
- A longer discussion takes place, to learn about the workload type, the potential hardware/software and computing environment impact. At this point the researcher and subject matter expert work towards a defined job-layout which is both optimal for the workload and best fit for infrastructure.
The initial consultation process generally takes between two to three hours.
UQ has empirical and measured evidence to suggest this method of personal interaction to breed a stronger capability in accelerated computing creates a far more efficient use of infrastructure, than the generally accepted process of providing a user a set of documents, readme’s and how-to instructions at a distance.
Early analysis suggests that there is a correlation between the employment of direct consultation and scientific discussion between a domain expert (in the scientific research domain) and a research computing specialist and the quality of the computational run or input in these accelerated computing platforms. This now forms the basis of the operating procedures of the Wiener supercomputing facility.
- UQ IMB ARC/NHMR Lattice Light Sheet Microscopy installation. Retrieved from https://imb.uq.edu.au/article/2016/11/45-million-imb-led-discovery-research, accessed June 8th, 2018
- Deconvolution Definition, Retrieved from https://en.wikipedia.org/wiki/Deconvolution, accessed June 8th, 2018.
- HPC Wiener harnessed for automating skin cancer diagnosis, Retrieved from https://rcc.uq.edu.au/article/2018/05/hpc-wiener-harnessed-automating-skin-cancer-diagnosis, accessed June 8th, 2018.
Jake is currently the Associate Director of Research Computing for UQ’s three large scientifically intensive research institutes – the Australian Institute for Bioengineering and Nanotechnology, the Institute for Molecular Bioscience and the Queensland Brain Institute.
Jake has spent the last 12 years in scientific computing, working on everything from building supercomputers to managing the strategy and complexity that comes with scientific endeavour.
Jake spends his time working to make scientific computing platforms, technology and infrastructure as good as it can be, such that world class research can be conducted, unencumbered.
Jake’s background is in both computer science and business leadership – constantly fighting with himself, trying to accommodate both (very different concepts) in his working life – ultimately to try and make them work together.