MDbox: a cloud-based repository and analysis toolkit for molecular dynamics simulations

Karmen Condic-Jurkic1, Mark Gregson2, Steven De Costa2

1The Australian National University, Canberra, Australia,
2Link Digital, Canberra, Australia,,


Computational modelling has become an integral tool in almost every branch of science, including chemistry and biology. Computational chemistry methods are now widely used to provide better understanding of molecular processes at the atomistic level, complementing experimental findings. Molecular dynamics (MD) simulations are a powerful technique used to study molecular structure and function by following the movement of atoms over a period of time by solving classical equations of motion. MD simulations are computationally demanding and time consuming calculations, often requiring supercomputer access and significant scientific input. Unfortunately, the primary data (trajectories) generated in the process are rarely made publicly available beyond the analysis presented in publications and supporting information, remaining locally stored on hard drives or private servers without public access. Considering the human and computational resources used to generate these trajectories, they present a very valuable asset in molecular studies, especially biomolecular and materials sciences. Currently, there are general repositories that allow hosting of research data, like Figshare, Open Science Framework (OSF) or Zenodo, but to the best of our knowledge, there is no publicly available repository dedicated exclusively to hosting and managing data generated by MD simulations. There are many benefits of having a specialised and centralised repository, including standardized data description and access to large scale data analysis.

MDbox is envisioned as a specialised open access repository for MD simulation datasets. MDbox aims to provide a platform for sharing trajectories and their corresponding input files, which should improve documentation of commonly used protocols and enhance the replicability and reproducibility of simulations [1]. EMDbox can be used for research data management and serve as a long-term storage solution for users. It will make collaboration and data exchange easier, and provide an alternative for making research publicly available and citable. A well-designed metadata schema [2,3] will lead to a better discoverability and HDF5 file format [4] can be used to store all the relevant simulation data in a single file, further simplifying data search and analysis. We are currently developing a prototype of the repository and are looking to engage the community and work in collaboration with potential users to help us shape the future development of this platform.

In our information-driven era, the open data approach is of great value for further development of computational modelling and for cross-disciplinary researchers in both academia and industry. The growing movement to open up data produced by publicly funded research provides additional incentive. However, the most exciting prospects for MDbox comes in the form of new research opportunities and the advancement of molecular modelling, ranging from developing new analytics tools for large datasets to machine learning techniques. Artificial intelligence is already spreading rapidly and provides exciting opportunities in almost every area of human activity and it is expected that it will have a major impact in medicine, drug design, protein engineering and creation of new materials. These methods will require large, curated datasets to produce informative and valuable results – MDbox will provide exactly this.


1. Hinsen, K., A data and code model for reproducible research and executable papers. Procedia Comput. Sci., 2011. 4, p. 579-588.
2. Hinsen, K., MOSAIC: A data model and file formats for molecular simulations. J. Chem. Inf. Model. 2014. 54(1), p. 131-137.
3. Thibault, J.C., Facelli, J.C., and Cheatham III, T.E., iBIOMES: managing and sharing biomolecular simulation data in a distributed environment. J. Chem. Inf. Model., 2013. 53(3), p. 726-736.
4. de Buyl, P., Colberg, P.H., Höfling, F., H5MD: a structured, efficient, and portable file format for molecular data. Comput. Phyis. Commun. 2014. 186(6), p. 1546-1553.


Karmen Condic-Jurkic earned her Masters degree in chemistry at University of Zagreb, Croatia in 2006. After that, she worked at Rudjer Boskovic Institute in Zagreb as a research assistant, followed by PhD in computational chemistry and biophysics awarded by Friderich-Alexander University (Erlangen, Germany) in 2013. The same year she joined The University of Queensland in Brisbane as a postdoctoral researcher, staying there until Oct 2015, when she moved to the Australian National University in Canberra.

Her research during PhD was mostly oriented toward molecular modelling of radical enzymes and their mechanisms using various computational methods, including quantum mechanics (QM) methods, molecular dynamics (MD) simulations and hybrid QM/MM techniques. The postdoctoral research has been mostly focused on structure and function of membrane proteins implicated in multidrug resistance using classical MD simulations as primary tool.

Outreaching, collaborating and connecting: Specialised services supporting researchers for the Performance-Based Research Fund (PBRF)

Dahlia Han1, Simon Esling2

1Libraries and Learning Services of the University of Auckland, Auckland, New Zealand,
2Libraries and Learning Services of the University of Auckland, Auckland, New Zealand,



Libraries; Research; Support; Collaboration; Funding; Software; Relationships, Outreach, Bibliometrics, Repository, Author identifier systems.


The Performance Based Research Fund (PBRF), launched in 2002, is a New Zealand tertiary education research funding process that assesses the research performance of Tertiary Education Organisations (TEOs) and funds them on the basis of their performance [1]. In the last three evaluation rounds (conducted in 2003, 2006 and 2012), Subject Librarians mostly answered queries relating to research impact metrics based on individual requests from researchers. The service was inconsistent between faculties in supporting researchers and their PBRF submissions at the University of Auckland. This paper presents the University Library’s new initiative to provide specialised services for PBRF across all faculties. The Library has identified multiple resources required to support University success in the PBRF process and has formalised the University’s PBRF support services for the 2018 evaluation round. This paper also shares the preparation and practices of PBRF Specialist Subject Librarians by considering two faculties – Engineering and Education – and the importance of outreach, collaboration and connection with key faculty PBRF personnel (PBRF Coordinators and Associate Deans) and faculty researchers. Services include training and consultations with regard to online tools such as the Research Outputs system (Symplectic), SciVal (Scopus), InCites (Web of Science), author profiles such as Scopus author ID, Google Scholar Profile and ORCiD. Furthermore, repository uploading, Open Access, publication verifications, bibliometrics and citation analyses, social media and research impact are discussed in respect to PBRF submissions and overall researcher success. As a result the Library is building excellent relationships across multiple faculties by proving its value to University of Auckland researchers via the new PBRF specialist service.


1. Tertiary Education Commission Performance-Based Research Fund. Available from:, accessed 13 June 2017.



Dahlia Han has worked in academic libraries in different roles for more than 30 years, 17 of which were as an Engineering Subject Librarian at the University of Auckland Libraries and Learning Services. Her current role is PBRF Specialist Subject Librarian for the Faculty of Engineering and Auckland Bioengineering Institute (ABI), while she is also the Subject Librarian for the Department of Civil and Environmental Engineering and ABI. Her main interests include academic and information literacy (AIL), research support in bibliometrics including altmetrics, research impact, author identifier systems, open access, research data management and scholarly communication. Her ORCID ID:

Simon Esling has worked at the University of Auckland Libraries and Learning Services for over 20 years across a number of different library divisions and in different roles. His current role is PBRF Specialist Subject Librarian for the Faculty of Education. His major interest is working with staff and students to develop their research skills. In recent years this has increasingly focused on postgraduate and academic success and identifying open access, social media and research impact as important areas for linking academe with the public.

Software Engineering – Visualisation of a complex model. Using CSIRO’s TAPPAS as an example, present the key challenges and success factors in engineering data visualisations.

Mr Craig Hamilton1

1Intersect Australia, Sydney, Australia,



TAPPAS (Tool for Assessing Pest and Pathogen Airborne Spread) is an online tool for modelling the dispersal of living organisms, developed through a CSIRO, Bureau of Meteorology and Intersect partnership.  TAPPAS uses  global air circulation data from the BOM’s numerical weather prediction model and models this using the HYSPLIT dispersion system  for computing simple air parcel trajectories.  TAPPAS combines this with knowledge of the organism’s biology, and delivers these in an easy to use interface that presents results as risk maps.

In five minutes we will cover some of the key challenges and successes of this project from an engineering perspective, and show a couple of the dispersion visualisations.

With the growth in demand and importance of data visualisation, the aim of this presentation is to help other delegates  understand some of the key success factors in engineering visual data from complex models.


Craig has over 20 years experience in software engineering, architecture and product management in higher education as well as local and global private companies.  From architecting and building the number one australian online shopping site in the early 2000’s to developing global identity management programs for over 20 million users Craig has designed and built systems that solve unique and complex problems with adoption, scalability and security.  As engineering manager of Intersect Australia for the last year Craig has overseen the delivery and development of a number of research software engineering products such as TAPPAS and CloudStor Collections.

Meeting the Big Science Needs of the SKA: What NREN’s can Do and the Internet Can Not

Mr Peter Elford1, Mr Tim Rayner2, Mr Chris Myers3

1AARNet, Canberra, Australia,

2AARNet, Canberra, Australia,

3AARNet, Melbourne, Australia,



The scale of the SKA [1] represents a huge leap forward in the engineering needed  to deliver a unique instrument (a radio telescope) as part of an international collaboration. The SKA will generate, process and store enormous quantities of data and AARNet has been working with several efforts to ensure this volume of data gets into the hands and systems of the science community. This talk will focus on work undertaken in partnership with GEANT and others [3] to prove network throughput from the AARNet backbone and the MRO [3] in Australia, to important research facilities in Europe, such as the GEANT backbone and ASTRON, as well as to the USA. The tests have been conducted with hosts connected at 10Gbps and 100Gbps, and prove the network throughput capabilities between AARNet and the wider NREN community. Notably, testing conducted over network paths through the commercial Internet demonstrated very poor results.

This lightning talk specifically relates to the Generating, Collecting and Moving Data theme.

[1] Square Kilometre Array –

[2] “Taking it to the limit – testing the performance of R&E networking” –

[3] Murchison Radio Observatory –

[4] NREN – National Research and Education Network


Peter Elford manages AARNet’s relationships across a broad range of Federal and state government agencies, and AARNet’s engagement with the Australian research community. He is a strong and passionate advocate for the role Information and Communications Technology (ICT) plays in enabling globally collaborative and competitive research through ultra-high speed broadband connectivity. Peter is an ICT professional with over 30 years’ experience within the government, education, research and industry sectors having worked at the Australian National University, AARNet (twice) and Cisco. In his first stint at AARNet (in 1990) he engineered much of the original Internet in Australia.

The Indigo Subsea Fibre System: eResearch Infrastructure in the Asian Century

Mr Peter Elford1

1AARNet, Yarralumla, Australia,



AARNet has entered into a consortium with Google, Indosat Ooredoo, Singtel, SubPartners, and Telstra to build a new international subsea cable system that will connect Singapore and Australia. Known as Indigo, the system will use coherent optical technology and spectrum sharing to deliver a minimum capacity of 18 terabits per second on each of two-fibre pairs. The broadband capacity that has been secured will meet the future growth in collaborative research, and transnational education, between Australia and our Asian partners for decades to come. This is the first time a National Research and Education Network (NREN) has entered into direct subsea ownership, and has been achieved without direct Commonwealth funding.

This lightning talk specifically relates to the Generating, Collecting and Moving Data theme, and highlights an outstanding example of national, sustainable, underpinning e-Infrastructure.


Peter Elford manages AARNet’s relationships across a broad range of Federal and state government agencies, and AARNet’s engagement with the Australian research community. He is a strong and passionate advocate for the role Information and Communications Technology (ICT) plays in enabling globally collaborative and competitive research through ultra-high speed broadband connectivity. Peter is an ICT professional with over 30 years’ experience within the government, education, research and industry sectors having worked at the Australian National University, AARNet (twice) and Cisco. In his first stint at AARNet (in 1990) he engineered much of the original Internet in Australia.

Calcyte: A simple tool for describing, packaging and publishing data collections

Dr Peter Sefton1

1University Of Technology Sydney, Ultimo, Australia,



Calcyte is a toolkit for managing metadata for collections of any kind of file-based data using spreadsheets – automatically generated from templates – for data entry (other methods may be supported in future). After the data owner enters information about the files and directories, Calcyte generates a static webpage and metadata files that describes the data in both human and machine-readable formats. Calycte’s output can be published on a webserver, or zipped for distribution. Calycte implements the proposed DataCrate format. Calycte is a python program, which can be run from the command line or via automated processes that detect changes in data on file shares.


The presentation will include a demo of using calcyte to describe a small data set, with commentary of its important features, and a demonstration of how it has been used to publish data at UTS.

Calcyte’s produces human and machine readable metadata in a format with the working title “DataCrate”. The UTS team is planning a beta release of both Calcyte and the DataCrate for eResearch Australasia.

Calcyte is available from:


Calcyte has been programmed by Peter Sefton and Michael Lake, and tested by the team at UTS eResearch, including Sharyn Wise and Michael Lynch.


Peter Sefton is the Manager, eResearch Support at the University of Technology, Sydney (UTS). Before that he was in a similar role at the university of Western Sydney (UWS). Previously he ran the Software Research and development Laboratory at the Australian Digital Futures Institute at the University of Southern Queensland. Following a PhD in computational linguistics in the mid-nineties he has gained extensive experience in the higher education sector in leading the development of IT and business systems to support both learning and research.

While at USQ, Peter was involved in the development of institutional repository infrastructure in Australia via the federally funded RUBRIC ( project and was a senior advisor the the CAIRSS repository support service ( from 2009 to 2011. He oversaw the creation of one of the core pieces of research data management infrastructure to be funded by the Australian National Data Service consulting widely with libraries, IT, research offices and eResearch departments at a variety of institutions in the process. The resulting Open Source research data catalogue application ReDBOX is now being widely deployed at Australian universities.

At UTS Peter is leading a team which is working with key stakeholders to implement university-wide eResearch infrastructure, including an institutional data repository, as well as collaborating widely with research communities at the institution on specific research challenges. His research interests include repositories, digital libraries, and the use of The Web in scholarly communication.

The NLeSC eScience users’ survey: learnings from actually asking actual users about actual use

Mr Guido Aben1

1AARNet, Kensington, Australia,


Consensus is developing among eScience policy makers, both domestically as well as overseas, that future eScience policy must contain provisions to more robustly evaluate eScience deliverables; both immediately upon delivery as well as at set intervals after delivery. Typically among the suggested indicators we find metrics about user acceptance, tool penetration and similar “social” values. Up to this point, however, few institutions or service providers (let alone countries) have actually executed any large-scale surveys to gather baseline data on performance, acceptance and penetration of their existing portfolio, nor have any large-scale surveys (as opposed to one-on-one interviews; the more traditional method) been executed canvassing expectations and predictions of research infrastructure users (across domains) and usage (at all capacity levels).

We are aware of one exception to this: the 2016 Netherlands eScience Centre survey, which was conducted during Q4 2015 and presented February 2016, across a (highly significant) population of 1048 respondents (9% of the population canvassed).

On the proviso that Dutch eScience policy and execution has commonalities with Australian national research infrastructure policy, this survey and its attendant report summary are a veritable treasure trove of insights and learnings, as well as a number of sobering observations about the efficacy and uptake of eScience tools, services, and platforms up to this point.

The lightning talk aims to present a few salient points, and alert people to the availability of an English translation of the 2016 Netherlands eScience Centre survey report.


Guido Aben is AARNet’s director of eResearch.

In his current role at AARNet, Guido is responsible for building services to researchers’ demand, and generating demand for said services, with CloudStor perhaps the most widely known of those.

Libraries and Digital Humanities Downunder

Ms Ingrid Mason1

1AARNet, Sydney, Australia,


This lightning talk will debate a single question:

Why does Australia need to foster the development of regional communities of practice and participate in international communities of practice, linking the digital humanities researchers and library practitioners, as part of research infrastructure capability development and library support for data intensive humanities and arts research?  

This proposed lightning talk is relevant to this year’s conference because the 2016 NCRIS Roadmap outlines, in section 1.4 Skills and Career Development, that “There are two elements to successfully utilising world-leading infrastructure.  The first is training and development of both facility managers and technical staff… The second element is the skill level of researchers.”

As a guide to the reader:

dh+lib is a community of “librarians, archivists, Library & Information Science graduate students, and information specialists” [1] in the US keen to contribute to the conversation about digital humanities and libraries.  The online platform for this community of practice of academic librarians emerged out of an Association of College & Research Libraries (ACRL) “digital humanities” special interest group [2].   The Alliance of Digital Humanities Organisations (ADHO) is an international network of digital humanities organisations, and the Libraries and Digital Humanities SIG is a ADHO special interest group to “foster collaboration and communication among librarians and other scholars doing digital humanities work.” [3]


  1. About dh+lib. Available from:, accessed 19 June 2017
  2. ACRL Digital Humanities Interest Group. Available from:, accessed 19 June 2017
  3. ADHO SIGS. Available from:, accessed 19 June 2017


Ingrid Mason, Deployment Strategist with AARNet, provides support for engagement and the uptake of the national research and education network (NREN) and services with AARNet members across the research, cultural and collections sectors. Ingrid has worked on several NCRIS programs: Australian National Data Service, National eResearch Collaborative Tools and Resources, and Research Data Services.

Making Terra-Bytes of data accessible in ‘web-time’!

Mr Uwe Rosebrock1, Mr Simon Pigot1

1Csiro, Hobart, Australia,



The  Australian  Wave  Energy  Atlas  (AWavEA)  portal  provides  access  to  a  32-year  hind-cast  of  wave  data  from  the Australian region at an hourly temporal resolution. In its entirety, it consists of nearly 20 TB of data of which a subset – 5 TB, is used to provide real-time time-series  analysis. In average a web-user’s  expectations  to wait for a asynchronous query is in the order of 10’s of seconds. With some simple measures the supporting data was prepared to allow analysis processes to return results covering over 300,000 records of data in under 10 sec. This is an improvement of 3 magnitudes to the standard layout of the data.

With increasing amount of data available and the cross-disciplinary  use, it is no longer feasible to simply copy or query large data holding remotely. Query processes are necessary in front of data, an example is the NCI data-cube. We like to present simple measure which allow improved access and make incorporation in spatial portals feasible.


Uwe Rosebrock is a Senior Software Engineer at CSIRO Oceans and Atmosphere in Hobart. He has extensive experience in large data processing, software design as well as project and defect management. Uwe leads a team of software engineers, who led the development of the ARENA-CSIRO Australian Wave Energy Atlas, and its integration into AREMI, and also developed CSIRO’s relocatable modelling system and the DIVE visualisation packages as part of the CSIRO/BoM/Navy BlueLINK program.

Journey without maps: Publishing linked open data for large archival collections

Mr Owen Oneill1, Mr Daniel Wilksch1

1Public Record Office Victoria, Melbourne, Australia,



In June 2017, Public Record Office Victoria (PROV) implemented a new repository for publishing linked open data about the open access records in our collection. This new infrastructure has enabled us to experiment with semantic web based approaches such as the Linked Data Platform (LDP) platform for describing the contextual information and resources related to records in the collection in way that maintains its semantic structure while being machine readable.


Public Record Office Victoria is in the process of implementing a range of new digital infrastructure components for facilitating use of the collection. A key part of this infrastructure is a repository using Fedora Commons for storing open access content and other contextual information about the collection. This data falls into four broad categories: extended description, transcription, user generated content and renditions (copies) of records.

The content stored in our Fedora Commons repository is structured using RDF using a simple data model based on the Portland Common Data Model. This data model enables us to describe the content and relationships between content entities, and to publish the data in a form that enables it to be more easily utilised, interrogated and repurposed.

While meeting our initial requirements, we expect to confirm and refine the data model over time, as the amount of data ingested into the repository grows. We are also exploring the appropriate use of widely implemented ontologies. As an archive, there is a potential tension between using ontologies to facilitate interoperability, while avoiding overlaying semantic meaning on the content we have custodianship over. This potential tension would benefit from further consideration and discussion in the sector as the use of semantic web approaches becomes more common.



Owen ONeill is the Program Manager for Public Record Office Victoria’s Digital Archive Program. The aim of the program is to replace the systems Public Record Office Victoria (PROV) uses for maintaining and facilitating access to the PROV collection.

Owen has previously been involved in a number of digital preservation and data management projects in Higher Education and Research.


About the conference

eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.

© 2017 - 2018 Conference Design Pty Ltd