Karmen Condic-Jurkic1, Mark Gregson2, Steven De Costa2
Computational modelling has become an integral tool in almost every branch of science, including chemistry and biology. Computational chemistry methods are now widely used to provide better understanding of molecular processes at the atomistic level, complementing experimental findings. Molecular dynamics (MD) simulations are a powerful technique used to study molecular structure and function by following the movement of atoms over a period of time by solving classical equations of motion. MD simulations are computationally demanding and time consuming calculations, often requiring supercomputer access and significant scientific input. Unfortunately, the primary data (trajectories) generated in the process are rarely made publicly available beyond the analysis presented in publications and supporting information, remaining locally stored on hard drives or private servers without public access. Considering the human and computational resources used to generate these trajectories, they present a very valuable asset in molecular studies, especially biomolecular and materials sciences. Currently, there are general repositories that allow hosting of research data, like Figshare, Open Science Framework (OSF) or Zenodo, but to the best of our knowledge, there is no publicly available repository dedicated exclusively to hosting and managing data generated by MD simulations. There are many benefits of having a specialised and centralised repository, including standardized data description and access to large scale data analysis.
MDbox is envisioned as a specialised open access repository for MD simulation datasets. MDbox aims to provide a platform for sharing trajectories and their corresponding input files, which should improve documentation of commonly used protocols and enhance the replicability and reproducibility of simulations . EMDbox can be used for research data management and serve as a long-term storage solution for users. It will make collaboration and data exchange easier, and provide an alternative for making research publicly available and citable. A well-designed metadata schema [2,3] will lead to a better discoverability and HDF5 file format  can be used to store all the relevant simulation data in a single file, further simplifying data search and analysis. We are currently developing a prototype of the repository and are looking to engage the community and work in collaboration with potential users to help us shape the future development of this platform.
In our information-driven era, the open data approach is of great value for further development of computational modelling and for cross-disciplinary researchers in both academia and industry. The growing movement to open up data produced by publicly funded research provides additional incentive. However, the most exciting prospects for MDbox comes in the form of new research opportunities and the advancement of molecular modelling, ranging from developing new analytics tools for large datasets to machine learning techniques. Artificial intelligence is already spreading rapidly and provides exciting opportunities in almost every area of human activity and it is expected that it will have a major impact in medicine, drug design, protein engineering and creation of new materials. These methods will require large, curated datasets to produce informative and valuable results – MDbox will provide exactly this.
1. Hinsen, K., A data and code model for reproducible research and executable papers. Procedia Comput. Sci., 2011. 4, p. 579-588.
2. Hinsen, K., MOSAIC: A data model and file formats for molecular simulations. J. Chem. Inf. Model. 2014. 54(1), p. 131-137.
3. Thibault, J.C., Facelli, J.C., and Cheatham III, T.E., iBIOMES: managing and sharing biomolecular simulation data in a distributed environment. J. Chem. Inf. Model., 2013. 53(3), p. 726-736.
4. de Buyl, P., Colberg, P.H., Höfling, F., H5MD: a structured, efficient, and portable file format for molecular data. Comput. Phyis. Commun. 2014. 186(6), p. 1546-1553.
Karmen Condic-Jurkic earned her Masters degree in chemistry at University of Zagreb, Croatia in 2006. After that, she worked at Rudjer Boskovic Institute in Zagreb as a research assistant, followed by PhD in computational chemistry and biophysics awarded by Friderich-Alexander University (Erlangen, Germany) in 2013. The same year she joined The University of Queensland in Brisbane as a postdoctoral researcher, staying there until Oct 2015, when she moved to the Australian National University in Canberra.
Her research during PhD was mostly oriented toward molecular modelling of radical enzymes and their mechanisms using various computational methods, including quantum mechanics (QM) methods, molecular dynamics (MD) simulations and hybrid QM/MM techniques. The postdoctoral research has been mostly focused on structure and function of membrane proteins implicated in multidrug resistance using classical MD simulations as primary tool.