The Stemformatics Virtual Lab: More than genomic data visualisation in the cloud

Mr Rowland Mosbergen1, Ms Isha Nagpal2, Mr Othmar Korn3, Ms Ariane Mora4, Mr Tyrone Chen5, Ms Chris Pacheco Rivera6, Professor Christine Wells7

1Department of Anatomy and Neuroscience, University of Melbourne, Melbourne, Australia, rowland.mosbergen@unimelb.edu.au

2Department of Anatomy and Neuroscience, University of Melbourne, Melbourne, Australia, isha.nagpal@unimelb.edu.au

3Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Australia, o.korn@uq.edu.au

4Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Australia, ariane.mora@uq.net.au

5Department of Anatomy and Neuroscience, University of Melbourne, Melbourne, Australia, tyrone.chen@unimelb.edu.au

6Department of Anatomy and Neuroscience, University of Melbourne, Melbourne, Australia, chris.pacheco@unimelb.edu.au

7The Walter and Eliza Hall Institute, The University of Melbourne, Melbourne, Australia & Department of Anatomy and Neuroscience, University of Melbourne, Melbourne, Australia, wells.c@unimelb.edu.au

ABSTRACT

STEMFORMATICS VIRTUAL LAB

Stemformatics [1] (www.stemformatics.org)  is primarily a web-based pocket dictionary for stem cells biologists running on the NeCTAR cloud [2]. Part of the stem cell community for over 6 years, it allows biologists to quickly and easily visualise their private datasets. They can also benchmark their datasets against 330+ high quality, preprocessed public datasets.

In this presentation we will show that there is more to Stemformatics than just visualisation and how this is changing what end users expect from their online tools.

Stemformatics is also a collaboration tool.  The first major collaboration was when the multi-omics “Project Grandiose” was published with Stemformatics in December 2014, resulting in in 2 Nature papers [3,4] and 3 Nature Communications papers [5,6,7].  Recently, LEUKomics, a blood cancer atlas, was created using the Stemformatics ecosystem [8].  Stemformatics is also part of the Bioplatforms Australia Stem Cell project.

Stemformatics curates public and proprietary data. To ensure community confidence we only allow high quality experiments to be made public. The failure rate of datasets processed by Stemformatics is currently 29.21%. The main reason to fail a dataset in Stemformatics is poor experimental design. The remaining datasets are high quality datasets that are valuable to the community for benchmarking and data mining.

Stemformatics has also started to mine these public datasets. The Rohart Mesenchymal Stromal Cell (MSC) Test [9] was the first publication that used Stemformatics public datasets to use machine learning to create a classification algorithm for MSCs.

Stemformatics is funded by a grant from the Australian Research Council Special Initiative in Stem Cell Science through Stem Cells Australia [10].

REFERENCES

  1. Wells CA et al Stemformatics: Visualisation and sharing of stem cell gene expression. Stem Cell Research, DOI http://dx.doi.org/10.1016/j.scr.2012.12.003
  2. NeCTAR research cloud website http://nectar.org.au/ accessed 6th of June 2017
  3. Hussein, S. M. I., Puri, M. C., Tonge, P. D. et al. Genome-wide characterization of the routes to pluripotency. Nature. DOI: 10.1038/nature14046
  4. Tonge, P. D. et al. Divergent reprogramming routes lead to alternative stem-cell states. Nature. DOI: 10.1038/nature14047
  5. Clancy, J. L. et al. Small RNA changes en route to distinct cellular states of induced pluripotency. Nature Communications. DOI: 10.1038/ncomms6522
  6. Lee, D-S. et al. An epigenomic roadmap to induced pluripotency reveals DNA methylation as a reprogramming modulator. Nature Communications. DOI: 10.1038/ncomms6619
  7. Benevento, M. et al. Proteome adaptation in cell reprogramming proceeds via distinct transcriptional networks. Nature Communications. DOI: 10.1038/ncomms6613
  8. LEUKomics article in Conversation: http://theconversation.com/how-big-data-is-being-mobilised-in-the-fight-against-leukaemia-74281 accessed 6th of June 2017
  9. Rohart et al. A molecular classification of human mesenchymal stromal cells. PeerJ. 2016 Mar 24;4:e1845. doi: 10.7717/peerj.1845. eCollection 2016.
  10. Stem Cells Australia website: http://www.stemcellsaustralia.edu.au/ accessed 6th of June 2017

Biography

Rowland Mosbergen is a project manager with 18 years experience in it as a developer, analyst/programmer, team leader, software architect and small business owner.

Rowland has worked at the corporate level with large financial institutions like Merrill Lynch and National Australia Bank on their risk management systems and at the small business level to provide computer support. He has been working in the Stemformatics Virtual Lab for the last 6 years.

Applications of eResearch in Population Health

Sina Masoud-Ansari1, James Diprose2, Nick Young3, Mark Gahegan4, Richard Hosking5, Arron Mclaughlin6, Stefanie Vandevjvere7, Cliona Ni Mhurchu8, Andrew Jull9

1University of Auckland, Auckland, New Zealand, s.ansari@auckland.ac.nz

2University of Auckland, Auckland, New Zealand, j.diprose@auckland.ac.nz

3University of Auckland, Auckland, New Zealand, nick.young@auckland.ac.nz

4University of Auckland, Auckland, New Zealand, m.gahegan@auckland.ac.nz

5University of Auckland, Auckland, New Zealand, r.hosking@auckland.ac.nz

6University of Auckland, Auckland, New Zealand, amcl080@aucklanduni.ac.nz

7University of Auckland, Auckland, New Zealand, s.vandevijvere@auckland.ac.nz

8University of Auckland, Auckland, New Zealand, c.nimhurchu@auckland.ac.nz

9University of Auckland, Auckland, New Zealand, a.jull@auckland.ac.nz

INTRODUCTION

The University of Auckland’s Centre for eResearch has been collaborating with researchers from the School of Population Health on a number of projects over the past two years; highlighting the changing landscape of research in this discipline and the need for technical support offering deeper engagement. Beyond providing access to computing and data infrastructure, we see demand for software development expertise, in some cases bringing production ready services to public as well as expertise in data analysis and machine learning. We present our recent projects in this area highlighting challenges and preliminary outcomes.

KIDS’CAM

Kids’Cam [1] is a novel dataset developed to understand the complex system of factors affecting childhood obesity. With over 1.3 million images and corresponding GPS records taken from wearable sensors, Kids’Cam provides a unique insight into the health environment of 169 school children in the Wellington Region, New Zealand. Researchers annotated these images by hand to understand the exposure to a range of health related factors such as food and food marketing. The Centre for eResearch explored the feasibility of using machine learning to automate the annotation process and developed workflows for processing the associated spatial data. The longer term vision for this work is to use Kids’Cam along with other datasets to create virtual labs for simulating the effects of policy, environment, behaviour and other factors in preventing obesity.

FOODBACK

Foodback is a mobile app that aims to empower people to create healthier community food places, through crowdsourcing of foods advertised and sold in and around local community settings, including: schools, medical centers, hospitals, supermarkets, takeaways and sport and recreation centers. Example screenshots of Foodback are shown in Figure 1. Foodback was developed through an iterative design process with key stakeholders where user workflows were prototyped and refined on paper, implemented, and then tested by the research group to gather feedback and improve the design. Data gathered with Foodback will be used to better understand the healthiness of the food landscape in New Zealand, and to help encourage and support local ‘change agents’ to make positive, healthy changes to foods advertised and sold in their settings.

Figure 1: The Foodback app

DIETCOST

Unhealthy diets contribute to obesity and diet-related non-communicable diseases (NCDs). The cost of food is a major determinant of food choices. The International Network for Food and Obesity/NCDs Research, Monitoring and Action Support (INFORMAS), coordinated at the University of Auckland School of Population Health, is developing methods to monitor the cost differential between healthy and less healthy, current diets in New Zealand and globally. The variation of the cost of diets is important but currently unknown. Many diet scenarios can be constructed using a list of commonly consumed foods to meet nutrient and food-based dietary guidelines (for ‘healthy’ diets) or specified nutrient and food intakes (for ‘current’ diets). This research involves developing a novel program to model the cost of the range of healthy and current diets using all different combinations of a selection of commonly consumed foods, determined by a set of constraints for each, and specified food and nutrient intakes. Through taking into account the variation of the cost of diets, the program will allow answering the question whether a healthy diet is significantly cheaper or more expensive than the current, less healthy diet. If successful, this program can be franchised to other countries.

CONCLUSION

Our collaborations show that there are exciting opportunities for eResearch professionals to support health researchers in novel methods for gathering and processing crowd-sourced data as well as supporting them to extract meaning from disparate data sources. We have found that working closely with researchers at the research design stage to prove and test workflows is crucial for providing effective software and analytics support after data has been collected. While demands for specialised IT are not new in traditionally computational disciplines, we see increasing need for app development and analytics support in health research and a gap between industry focused IT and research that eResearch units in universities can be effective in.

REFERENCES

1. Signal LN, Smith MB, Barr M, Stanley J, Chambers TJ, Zhou J, Duane A, Jenkin GLS, Pearson AL, Gurrin C, Smeaton AF, Hoek J, Ni Mhurchu C. Kids’Cam: An objective methodology to study the world in which children live. American Journal of Preventive Medicine 2017; published online April 25, 2017: http://doi.org/10.1016/j.amepre.2017.02.016.


Biography

I work as a Research IT Specialist in the University of Auckland’s Centre for eResearch.

Virtual Childhood Obesity Prevention Laboratory

Sina Masoud-Ansari1, Mark Gahegan2, Andrew Jull3, Cliona Ni Mhurchu4

1University of Auckland, Auckland, New Zealand, s.ansari@auckland.ac.nz

2University of Auckland, Auckland, New Zealand, m.gahegan@auckland.ac.nz

3University of Auckland, Auckland, New Zealand, a.jull@auckland.ac.nz

4University of Auckland, Auckland, New Zealand, c.nimhurchu@auckland.ac.nz

INTRODUCTION

Obesity is a complex system operating at many levels, containing a diverse set of actors, and operating via different mechanisms and operative pathways. These characteristics suggest the need for new and more dynamic methods to better understand determinants and identify solutions. Typically a “reductionist” approach has been taken in obesity research, which involves studying individual decontextualized risk factors that operate at one level only and don’t account for interrelatedness and reciprocity between exposures. In contrast, “systems thinking” suggests that complex, dynamic systems, which feature multiple interdependent components whose interactions may include feedback, non-linearity and lack of centralised control, are best understood holistically. A systems approach can thus complement other obesity research by adding dimensions that reductionist approaches cannot. The Kids’Cam [1] study, provides a unique source of data to build a simulation model of New Zealand children’s food and activity environments. The dataset contains four days of data from 169 ethnically and socioeconomically diverse NZ children on where they go (GPS data), what they see and who they interact with (1.3 million images collected using automated, wearable cameras).

KIDS’CAM

The Kids’Cam data provide key insights into some of the exposures and interactions that are needed to build better simulation models. Information on children aged 11-13 years and their schools, homes and exposure to advertising and use of local food outlets can all be extracted from this data. Demographic data from the Kids’Cam participants can be used to assign characteristics to virtual children and GPS data collected on children’s movements during the study could be used to assign typical travel routes around a virtual neighbourhood. Our aim in this project was to develop, test and validate the recognition and extraction methods that are needed as a precursor to simulation modelling. The resulting datasets of GPS routes and exposure to food branding will provide the necessary starting point for building simulation models of exposure and using model experiments to assess the potential impact of new policies or interventions.

AUTOMATED CLASSIFICATION AND DATA INTEGRATION

The Centre for eResearch provided the expertise in automated image classification, data integration and spatial analysis. Manually annotating images in the Kids’Cam data is labour intensive due to the large number of images. We investigated the potential for automated classification of images to reduce the effort required to extract data for this and future data sets. The NVIDIA DIGITS deep learning framework was used to train classifiers to detect features of interest such as whether the image was taken at school, in a supermarket or at home and whether the image contains particular food or drink items. A sample of manually annotated data was used to train the classifier and classification worked well for detecting the image setting in homogenous environments such as public and private transport. Detection of heterogeneous environments such as a participant’s home or school proved more difficult requiring more training data to reach sufficient accuracy. By combining information from the annotated images and the associated GPS records of children’s activities, we were able to create maps of exposure to food/drink items of interest. Figure 1 shows an example of the distribution of ‘fast-food’ exposure in Porirua, New Zealand. These datasets will create the foundation for future projects which aim to create simulation models of children’s activities and the effect of policy decisions on exposure and obesity.

Figure 1: Distribution of fast-food exposure in Porirua, New Zealand

REFERENCES

1. Signal LN, Smith MB, Barr M, Stanley J, Chambers TJ, Zhou J, Duane A, Jenkin GLS, Pearson AL, Gurrin C, Smeaton AF, Hoek J, Ni Mhurchu C. Kids’Cam: An objective methodology to study the world in which children live. American Journal of Preventive Medicine 2017; published online April 25, 2017: http://doi.org/10.1016/j.amepre.2017.02.016.


Biography

Research IT Specialist at the Centre for eResearch at the University of Auckland

eHealth Research Data Storage as a Service – a nexus between clinicians and researchers

Mr Mohammad Islam1

1Intersect Australia, Sydney, Australia mohammad@intersect.org.au

BACKGROUND

Technological advances and rapid adoption in healthcare systems create significant opportunities to conduct further research with digital information from patients to address critical problems and provide better diagnosis or treatment outcomes. However, capturing human-derived research data from healthcare systems creates significant data management challenges that stem from both the scale and complexity of healthcare operations, IT infrastructure, policies and the associated sensitivities in maintaining the privacy of patients.

Research data differs from conventional enterprise data as it is multi-terabyte to petabyte scale data that is written and accessed frequently and/or infrequently, and shared amongst researchers. NSW Health has recognised that servicing research data differs from the standard allocate and increase model of traditional storage and has therefore acknowledged the need for an alternate storage service for researchers in the Local Health Districts (LHD) and broader medical research community. NSW Health’s approach is therefore to offer “Research Data as a Service (RDaaS)” via its eHealth NSW State Wide Information Service (SWIS) authentication and Health Wide Area Network (HWAN). RDaaS is the first proof point project of the NSW eHealth “Infrastructure As A Service Journey”. The service aims to enable research data interchange with higher education research institutions via a neutral third party organisation to an estimated 10,000 researchers, medical students, and clinicians.

Intersect supports a variety of researchers in Australia by offering IT products and services that can address RDaaS requirements that enable data interchange with the wider research community, particularly university researchers. Intersect’s offerings include collocated hierarchical petabyte scale data storage with high speed transfer mechanisms and data management platforms, large shared compute clusters and cloud computing as well as expert advice such that researchers can fully leverage the purpose-built infrastructure. Intersect offers physical, jurisdictional and sovereignty confidence within the Australian regulatory context. Intersect is also the lead node of med.data.edu.au, a nationally funded infrastructure for health and medical research data that aims to offer secure storage, co-located compute services, management of data and a data registry for dissemination of results and data discovery and access. The project has also developed a valuable resource library that provides information and guidance on legislation, codes, policies, best practice and IT security frameworks.

SUMMARY

In this presentation, we will discuss how Intersect is engaging with eHealth NSW, NSW Health researchers and clinicians, , AARNet, the AAF, and other stakeholders to offer an integrated storage, compute and data management and discovery platform for the NSW Health RDaaS project. The presentation will cover off on the following activities and their challenges within the the Research Storage as a Service proof point, namely:

  1. Proof of concept use cases
  2. HWAN and SWIS integration
  3. Service provisioning

Biography

Mohammad Islam is Data Science Manager at Intersect and spearheads Intersect’s digital data initiatives nationally. He works collaboratively with relevant stakeholders to develop, plan, manage and implement data services and eResearch initiatives for Intersect at a state and national level. Mohammad is intricately involved in the national med.data.edu.au project where he provides medical, health and biological expertise. He has a Master of Philosophy (MPhil) and a Ph.D. (submitted) in Bioinformatics from Macquarie University. Mohammad’s research background is text mining, biological knowledge extraction, data integration, computational proteomics, and bioinformatics analysis. Mohammad has previously held the role of Operations Manager with intersect, and was responsible for the operational aspects the research data storage, cloud, and supercomputing infrastructure. This expertise, alongside previous Support Manager roles, gives Mohammad a dual background in IT and Research which positions him well to understand researchers’ needs and identify appropriate data storage platforms and solutions for their research data challenges.

 

About the conference

eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.

© 2017 - 2018 Conference Design Pty Ltd