Presentations 2011
| PAPERS | |
| Guido Aben
Wendy Mason Paul Bonnington Louis Moresi |
The FileSender project: integrated software development from Proof-of-Concept code to package The FileSender project (http://www.filesender.org) is a project to bring very user-friendly large file transfer to the widest possible constituency of researchers — certainly beyond the traditional remit of comparatively computer-literate users of eScience-type tools. The project delivers software that allows an R&E service provider to bring up a file transfer service presented as a webservice. The Australian version of this service is called “CloudStor” (https://cloudstor.aarnet.edu.au) and is hosted by AARNet. Judging by the uptake of the FileSender project’s principal software package, the project has been a success; user uptake is swift, does not require any hand-holding and the majority of users are repeat customers. Also, the number of international R&E providers running this package is increasing rapidly, thereby spreading maintenance load and providing many more critical eyeballs to spot quirks and bugs. This talk focuses on a handful of reasons why we think the FileSender project has managed to increase its chances of success; mostly they revolve around our policy of assuming from the outset that the project might have to scale up significantly beyond a small in-house development, rather than taking an ad-hoc path-of-least-resistance early on. We hope the talk may inspire members of the audience involved in software development, software project management and eResearch tools policy to take the longer view and plan for large scale and longer horizons; the reward, we feel, is well worth it. Biography – Guido Aben Biography – Wendy Mason Biography – Paul Bonnington Biography – Louis Moresi |
| Rasika Amarasiri
Philip Gharghori |
MonFiDS: An Integrated Financial Database As there are a number of researchers working on the same datasets, this increases duplication of these efforts and accumulates a significant waste of valuable resources and time, which could have been spent on more productive work. The Monash Financial Database System (MonFiDS) was developed with the objective of minimising these efforts and to allow researchers to concentrate on the actual research by taking off the pressure of integrating the different datasets and checking the integrity of the merger. MonFiDS currently integrates financial, accounting, market index and earnings estimate data for Australian listed companies from five different data sources. The database allows researchers to download this data selectively or as a whole via a web portal. Common manipulations in the merging process are included in the extraction process that minimise the requirement for post-processing of the data. Biography – Rasika Amarasiri Biography – Philip Gharghori |
| Steve Androulakis
Ulrich Felzmann Alistair Grant Ian Thomas Ryan Green Grischa Meyer Anthony Beitz Paul Bonnington Chris Myers Heinz Schmidt Ashley Buckle |
Taking TARDIS Into New Dimensions: Results and Reflections MyTARDIS began as an automated solution for managing and sharing raw protein crystallography data. Since then, efforts from many independent projects have enhanced and evolved the central MyTARDIS product. New features such as data staging mounts, automated metadata extractors, parameter set creation and high performance computing task scheduling have been added to meet researcher needs. With these new features in hand, MyTARDIS is currently being deployed to manage data from diverse areas of research, including microscopy / microanalysis, particle physics, next-gen sequencing in addition to expansion at the Australian Synchrotron and ANSTO to support small / wide angle x-ray scattering, infrared microspectroscopy, powder diffraction, neutron reflectometery, small-angle neutron scattering and strain scanning data. Furthermore, an initiative to capture and publish all types of research data at an institutional level has begun. This presentation will feature speakers from individual projects working with the open source MyTARDIS code base, along with an explanation of the software’s new developments and personal experiences from attempting to richly capture and manage an expanding range of research data. Biography – Steve Androulakis Biography – Grischa R Meyer Biography – Paul Bonnington Biography – Ian Thomas |
| Baden Appleyard | Licensing Australia’s Research Outputs Under a Single Framework AusGOAL (Australian Governments Open Access and Licensing Framework) provides support and guidance to government and related sectors to facilitate open access to publicly funded information. AusGOAL makes it possible to use and re-use information or data in a way that drives innovation and entrepreneurial activities, and provide economic and social benefits to the community. Opening Australia’s Publicly Funded Research and Innovation Australia has a vibrant research and innovation sector which supports and enhances virtually every aspect of our life experiences. Although all aspects of research are important, this part of the AusGOAL website is focuses on the licensing of research outputs. A significant fraction of the research output produced in Australia is publicly funded, either directly via the major funding bodies, or, indirectly via institutional funding. Like the Australian Governments, the major research funding bodies and research organisations are moving towards requiring that information and data from their projects be published and made available for re-use. Licensing research data using AusGOALOne of the essential ingredients for making data re-usable is the provision of clarity of permissions, terms, and conditions. Prospective re-users need to know what they can and cannot do with the data. A lack of clarity about permission to re-use data can have the same result as forbidding data re-use, because uncertainty can be enough to discourage the potential re-user. While there are many different ways of licensing data, there is a strong advantage in using a consistent approach. The technology now exists to allow individual datasets to be combined with others in novel ways to help solve ever more complex problems. The use of a single licensing framework, such as AusGOAL, across the research community and public sector enhances the potential for data sets to be combined for analysis, relieving researchers of the burden of keeping track of different conditions and permissions. Biography |
| Justin Baker
Peter Tyson |
CSIRO Remote Visualisation Capability CSIRO is a large, geographically dispersed research organisation with over six-thousand staff at fifty-six sites. Like all modern research organisations, CSIRO researchers are dealing with ever larger, more complex scientific datasets. In response to these two significant and disparate issues, geographical separation of staff and increasing data complexity, an enterprise-wide remote visualisation capability has been developed. Scientific visualisation is generally recognised as a key to addressing many data complexity issues. As an adjunct to that, remote visualisation provides researchers with virtualised access to advanced visualisation hardware and software, regardless of their location. CSIRO’s remote visualisation capability is being progressively rolled out across the organisation. The architecture is based on commodity computing hardware and two key open source projects. The first of these, VirtualGL virtualises the graphics hardware making it remotely accessible. The second, VizStack, is used to allocate limited compute and graphics hardware resources to users as required. There are several significant advantages to this approach. The remote visualisation service enables scientists to access high-end graphics hardware directly from their desktop PCs, mitigating the need to purchase dedicated visualisation workstations. It provides a shared/remote collaborative viewing capability enabling researchers at different locations to interact with one another’s data in real-time. The virtualised environment provides access to a wider range of applications on different platforms rather then being limited to the user’s local operating system. Lastly, the visualisation hardware is centrally managed in a data centre, and makes use of a shared storage system minimizing the need to transfer large volumes of data between different systems. Biography – Justin Baker Biography – Peter Tyson |
| Venki Balasubramanian
Amir Aryani Ian Thomas Heinz Schmidt |
Data Capture from High-Performance Computing Facilities: A Case Study E-Research has enabled researchers to develop new insights and new solutions to complex problems by use of technologies in research collaboration. The Data Curation (DC) application is being developed in eResearch Office at RMIT University for curating of datasets generated by various material physics simulation packages from High Performance Computing (HPC) facilities. The purpose of this presentation is to exemplify the challenges faced during the development and shows how some of these problems were resolved. An end-to-end e-research system involves sub-systems that are heterogeneous and domain specific. We addressed interoperability as a prime consideration because of lack of established e-Research standards for both systems and data. This necessitated the use of data adapters and converters. Contemporary designs that have role-based security models may not work well with e-Research software. It is difficult to define definite roles for researchers due to extensive collaboration between institutions. We identify that authorisation of researchers is a major concern best addressed by federated approach. In e-Research, we identify harvesting data from existing legacy systems as a common concern. We chose to use a “system of systems” architecture. Due to the ad-hoc nature of orchestration of such components we note difficulties in controlling overall reliability. While reusability is a prime consideration to reduce development time and costs, we identify that researchers and developers in specific disciplines create software to solve their own problems. This limits reuse of solutions more widely for the long term. However, the involvement of domain experts in the developments is imperative for success. We observed that the project would not be successful without the collaboration of the researchers (domain users). However users want to limit the changes to their current workflows as little possible, so solutions must carefully integrate and show clear advantages to the researchers. Biography – Venki Balasubramanian Biography – Amir Aryani Biography – Ian Thomas Biography – Heinz Schmidt |
| Anthony Beitz
Calvin Chow Paul Bonnington Steve Androulakis Virginia Gutierre Simon Yu |
Platforms for Research Data Management: Lessons Learned There is growing importance in properly capturing, managing, publishing, and reusing research data and metadata, and there have been injections of research infrastructure funds towards this activity by the Australian Government (NCRIS & Super Science initiatives) which is distributed and managed by the Australian National Data Service (ANDS). Consequently, there has been growth of new and adapted platforms for Research Data Management (RDM) at Australian research institutions. This trend is expected to continue over the next few years due to additional Australian research infrastructure investments.Monash University has experience in: developing research data management platforms for the DART and ARCHER projects; determining research data capture and management solutions for various research disciplines for ANDS data capture projects conducted at Monash; and developing and deploying innovative research capabilities within the university. This paper, informed by these experiences, provides some useful guidance in the selection, development, and deployment of platforms for research data management. Biography – Anthony Beitz Biography – Paul Bonnington Biography – Steve Androulakis Biography – Calvin Chow Biography – Simon Yu Biography – Virginia Gutierre |
| Craig Bellamy | Teaching the Digital Humanities through Virtual Research Environments At the core of the work done within the digital humanities is a difficult interdisciplary relationship between the at times divergent cognate fields of computer science and the humanities. This presentation will discuss some of the central characteristics of the digital humanities whilst examining some of its ‘hard-interdisciplinary’ relationships. The author will suggest a model where ‘hard-interdisciplarity’ may be taught and assessed; through the framework of Virtual Research Environments (VREs). The presentation will demonstrate some of the latest work in the development of VREs in the humanities that encourage the critical use and analysis of the digital objects within them. It is the contention of the author that building ‘hard-interdisciplary’ relationships between humanities and computing technology should engender a critical and deeply scholarly understanding of technological production and VREs are one way to achieve this in the classroom. Biography |
| Jared Berghold
Tim Churches |
SURE: a secure remote-access data analysis laboratory for research using linked health data. Australia has a well-established system of health data collections, many of which cover the entire population and/or both the public and private health care sectors. The scope and population coverage of this health data infrastructure offers enormously valuable opportunities for research which explores disease causation and prevention, health differentials and inequalities, geographic and spatial aspects of health, and the effectiveness of treatments and health services. However, most of these health data collections relate to episodes of care or to specific diseases. In order to assemble research data which provides a longitudinal view at the person level, the records in these data collections which relate to a particular individual must be linked together. In the absence of a unique personal identification number of broad scope in Australia, linkage of routinely-collected health records from multiple sources and settings is performed for research purposes by special-purpose Data Linkage Units (DLUs). These DLUs receive names, addresses, dates of birth and other identifying details abstracted from health system records, but they have no access to the health or medical details contained in those source records. Researchers are provided with de-identified versions of the relevant health records, together with research-study-specific sets of person-level links between those records, enabling them to conduct longitudinal and other complex linked-data analyses. Such arrangements provide excellent first-order protection of individual privacy. However, residual risks to privacy remain, despite the nominal de-identification of the linked data provided to researchers, due to the very high dimensionality of the linked data sets, and the high cardinality of many of the data items which they contain – this makes re-identification of the linked data feasible, and in some cases, easy. Therefore, it is important that the linked, de-identified data are treated as highly confidential by researchers, and that these data are kept very safe. The impact of unauthorised access to or loss off control over these data is potentially large, given that many linked data research studies require access to health data for hundreds of thousands or millions of individuals. Currently in Australia, this risk is managed through researcher undertakings that they will not attempt any form of re-identification or further record linkage of the data supplied to them, and that they will prevent unauthorised access to those data. There are no reasons to believe that researchers do not strive to honour these obligations. However, the data supplied to researchers are, of course, in digital form, and are typically stored and used in institutional computing environments which may not have been designed with security in mind. Often researchers have limited ability to influence or even determine the computing security arrangements in their workplaces. Such problems are compounded when researchers in multiple locations need to work on the same linked research datasets as part of a collaborative study.In order to better manage these risks, as part of the NCRIS Population Health Research Network capability and supplemented by EIF SuperScience funds, the Australian and NSW governments have jointly funded the establishment of a secure, remote-access data analysis facility specifically for use by population health, health services and clinical researchers working with linked data. The facility, known as SURE, is currently undergoing pre-production testing prior to becoming operational in December 2011. It provides researchers with a highly secure remote virtual computing desktop for each research study on which they are an investigator. All data ingress and egress from the facility is via a “Curated Gateway” in order to ensure that only those data files which have been approved and permitted by the Human Research Ethics Committee(s) with oversight of each research study are brought into the SURE environment, and that only those research outputs which have been screened for potential privacy disclosure risk are released from the facility to the relevant researchers’ normal computing environments . Researchers must undergo compulsory training in privacy, IT security and statistical disclosure risk assessment and control, before being permitted to use the facility. Inside the facility, each research study is additionally confined within its own security perimeter – there is no possibility of data exchange between research studies. Access to the facility is strongly authenticated using two factors. The remote virtual computing environments provided to each researcher are powerful, highly-specified Microsoft Windows 7 desktops, furnished with a wide range of proprietary and open-source data manipulation and analysis software. However, despite all these perimeter controls, within the workspaces for each study, there are no additional restrictions placed on researchers – they may examine, manipulate and analyse the data in whatever ways they see fit, within the constraints placed upon them by the data providers and overseeing ethics committee(s) for that study. The centralised nature of the facility allows significant economies of scale with respect to hardware provisioning and software licensing costs, and it is expected to be financially sustainable at levels of cost-recovery which are acceptable to researchers and research funding agencies. SURE also facilitates the creation of specialised data manipulation and analysis tools which would be difficult to deploy in existing, heterogeneous research computing environments – these tools will be described. Biography – Jared Berghold Biography – Tim Churches |
| Peter Blain
Paola Petrelli Jason Lohrey Nathan Bindoff |
Outcomes of the Marine and Climate Data Discovery and Access Project (MACDDAP) The Marine and Climate Data Discovery and Access Project (MACDDAP) was an e-Research project, funded by the National eResearch Architecture Taskforce (NeAT) under the National Collaborative Research Infrastructure Strategy (NCRIS). The project was completed in June 2011 and successfully delivered on its stated objective, which was to integrate large marine and climate data sets, and to deliver them through a wide range of data streams – thus engaging a broad community. The project built on web services technology to integrate marine and climate data sets distributed across Australian research institutions. The outputs delivered by MACDDAP facilitate knowledge discovery for marine and climate related applications by enabling researchers to collect, combine and analyse relevant data across scientific disciplines. MACDDAP has built on open scientific and geospatial data standards to enhance specialised web harvesters and search tools, to deliver large geospatial data-sets to users via web portals. MACDDAP also provides the functionality required to support these services, including an aggregator for combining geospatial data from distributed sources, and a translator for translating data sets into standard vocabularies used in meteorology and oceanography. Biography – Peter Blain. Biography – Paola Petrelli. Biography – Jason Lohrey. Biography – Nathan Bindoff. |
| Ann Borda
Lyle Winton |
Research Data Infrastructure Approaches The development and implementation of research infrastructures have been shaped by the need to collaborate, retain and reuse data. Documenting the practice of research, and therefore the context of the data, is essential for the easy discovery of appropriate data for reuse in the future. Increasingly institutions are providing support systems that facilitate the management of and collaboration in research projects. Such systems allow the creation of virtual research environments (VREs) or collaboration environments, which can serve as documentation of the research process, as well as repositories of data and records. Research infrastructure providers take on responsibility in supporting this research need but are not always ideally suited to long-term retention. Biography – Ann Borda Biography – Lyle Winton |
| Joshua Bowden | OpenCL implementations of principal component analysis for large scale data analysis and benchmarking of heterogeneous clusters. Programming environments for General-Purpose computation on Graphics Processing Units (GPU) have improved rapidly in the past decade. They allow a programmer to tap into the potential of GPU based devices for non-graphics tasks. As a widely adopted programming standard, OpenCL attempts to standardize the programming of the various devices constituting a heterogeneous computing system. A benefit of using widely adopted standards such as OpenCL is that it allows the comparison of performance of an algorithm on a variety of modern CPU architectures and GPU based system. An OpenCL implementation of the Non-linear Iterative Partial Least Squares algorithm used in principal component analysis has been used as a benchmarking program to test a range hardware for the core vector-matrix operations that are at the heart of the algorithm. This algorithm can be time-demanding for large data sets owing to its iterative nature. Results of benchmarking workstation, cluster and cloud based solutions are described. The measurement and modelling of these workloads results in a better understanding of the economies the different systems bring to research based computation. Biography |
| Andrew Buttsworth
Rhys Newman Peter Wheeler |
The SkyNet: Harnessing the Power of the Community for Radio Astronomy Research The International Centre for Radio Astonomy Research initiated the SkyNet project to engage the community and raise awareness of radio astronomy. The initial proposal was inspired by the many citizen science projects that have gained from making science accessible to the broader community; this community engagement enabled these projects to have access to resources which would have been impossible to fund by normal means. The distributed computing backend of the SkyNet is based on the Nereus-V; which is an open-source pure Java™ desktop cloud distributed computing technology. If we can achieve a client base of approximately 10,000 workstations in the higher education sector around Perth, we will have a distributed computing network capable of approximately 100 TFLOPS. This has the potential to have a significant impact on the Australia/New Zealand bid to host the Square Kilometer Array, and will allows thousands of users to contribute directly in ground breaking research. Biography – Andrew Buttsworth Biography – Rhys Newman Biography – Pete Wheeler |
| Leslie Carr | Mind the Gap! Moving From Aspiration to Experience in UK Institutional Research Data Management The aim of the Institutional Data Management Blueprint (IDMB) project, funded by JISC in the UK, has been to create a practical and attainable institutional framework for managing research data that facilitates ambitious e-research practice. A candidate tool to support this responsibility is the institutional repository – an information storage and management tool conjoined with extensive social support and advice structures from the library. In order to acknowledge and manage their data management responsibilities, IDMB provides an overall framework within which to plan and develop institutional data management strategy. This paper describes the main practical developments being made to an institutional repository platform as a result of the IDMB data management survey and audit. The University of Southampton Institutional Repository is based on the EPrints platform (v. 3.2), configured for some rudimentary data support that makes research data discoverable, but not easily interpretable or reusable. A table of data points may be provided as a spreadsheet, a database or a PDF, but guidance as to the interpretation of those figures is not easy to come by. Nor is it easy to understand the relationship between multiple data files (components of complex data objects.) The paper describes some simple amendments to the repository’s document model to facilitate human and software interpretation of the document contents and the role of individual data components. Biography |
| Ron Chernich
Peggy Newman Simon McNaughton Jane Hunter |
CABER – A Registry for Recording and Reporting Australian Algal Blooms CABER is a web based interface for capturing monitoring data and sightings of potentially toxic algal blooms.The project is a collaboration between the Qld Dept of Environment and Resource Management (DERM), the UQ eResearch Lab and Healthy Waterways. Currently it contains data specific to the coastal and estuarine regions of South-East Queensland but is designed to support algal bloom observations from across Australia.Our presentation will demonstrate the data capture, upload and visualization methods (including the mapping and timeline search and browse interface) for recording and analysing algal bloom observations. We will also describe the iPhone application developed to enable field data capture (including photos and species identification). Challenges experienced in adopting smart-phone technology will be described together with future directions and the problems associated with managing community-generated data. Biography – Jane Hunter Biography – Peggy Newman |
| Paul Coddington | Outcomes of the NeAT Program: eResearch tools and services for national research communities The National eResearch Architecture Taskforce (NeAT) was a committee of experts established under the NCRIS Platforms for Collaboration capability. NeAT was responsible for identifying a portfolio of projects to develop and implement new eResearch tools and services. Each project needed to: provide production eResearch tools or services to meet the needs of a particular research community, but with potential for broader use; encourage eResearch uptake, awareness raising and skills development; aim to significantly improve research processes; identify long-term providers that will host and support the services; and have significant co-investment from the user community. Fifteen projects targeting a broad range of research disciplines were selected by NeAT. Projects received funding for 2.5 to 4 EFTs for 18 months to 3 years, with significant additional in-kind effort and resources provided by the project partners. Funding, management and technical input to the NeAT program was provided jointly by the Australian National Data Service (ANDS) and the Australian Research Collaboration Services (ARCS). Ten of the NeAT projects targeted the requirements of NCRIS national scientific research communities, with four other projects working with national organisations in the humanities and social sciences. Most of the projects were strongly focussed on managing, accessing or sharing data, with the others providing tools and services for analysis, processing and visualisation of data sets. The NeAT projects provide exemplars of how research practices can be improved or transformed by the use of eResearch tools. Researchers and research organisations are reporting that the NeAT-funded tools are having significant impact on their research or how they deliver data or eResearch services to their community. This pres entation will provide an overview of the NeAT program and a brief summary of the outcomes of each of the NeAT projects, and the overall program. Biography |
| Michael D’Silva
Chris Myers |
The ‘Imax’ of science labs – the next generation of eResearch In the past VeRSI has demonstrated the eVBL (educational Virtual BeamLine), which proved that remote access to the Australian Synchrotron was possible. VeRSI then showed us that Synchrotron Users could remotely load samples and move motors on MX1 (Macromolecular Crystallography). VeRSI has now pushed the boundaries of remote access and remote control in the Australian research space. Remote access and remote control in a collaborative space to an expensive instrument like a beamline at the Australian Synchrotron or the XPS (X-ray photoelectron spectroscopy) at La Trobe University Bundoora is really hard. The people responsible of these expensive instruments do not like having more than a few people near the instruments. Special training like OHSE and Radiation Safety need to be undertaken by all users who go near the instrument and/or the facility. Also, “due to the nature and expense of these instruments, sharing instruments is essential and may require researchers to travel to the location of the instrument.”[2] This costs both time and money and often causes scheduling and data transportation problems. To tackle this problem, a collaboration of La Trobe’s eResearch Office, La Trobe’s CMSS (Centre for Materials and Surface Science) and VeRSI built a room called VisLab1. This room provides an immersive environment for a group of up to 30 researchers or students to access instruments from a remote location. The high-tech laboratory contains all the latest in visualisation technology including a 95m2 multi-screen projection wall, six touch screens and video conferencing equipment, all in 1080p High Definition. It also has a twelve monitor 175″ display wall running a Microsoft Windows PC for displaying ultra-high resolution visualisation data. REFERENCES1. La Trobe University Bulletin, The ‘Imax’ of science labs – next generation facilities, 2011. http://latrobeuniversitybulletin.com/2011/06/08/the-’imax’-of-science-labs/2. VeRSI, VisLab Sneak Peek, eNewsletter 15, 2011, https://www.versi.edu.au/news-and-publications/enewsletter/enewsletter-15/vislab-launch Biography – Michael D’Silva Biography – Chris Myers |
| David Eyers
Russell Butson |
Managing sensitive data across the data life cycle The management of and responsibility for raw data is a central aspect of empirical research. Most traditional texts on the practice of research have sections outlining various approaches for categorizing, storing and recovering information from concrete artefacts like paper and tape. For many, the adoption of the more abstract digital media meant a shift from autonomous control to one reliant on technologists. Over the years researchers have become more self-reliant through the wide-spread use of personal computing.The proliferation of cheap, high-capacity storage technology has made it possible for researchers to store large amounts of data. However, the ethical responsibility on principal investigators requires the management of raw data beyond the task of storage. Many types of research projects require collaborative sourcing, management or sharing of sensitive datasets. Often researchers make do with an ad hoc approach to sharing data, without fully appreciating (or even considering) the risks involved, often because of the perceived inaccessibility of higher quality, managed solutions. However, the significant economies of scale to be gained having backup and on-line redundancy of physical media managed independently from research data create difficulties in cases where the repositories contain highly sensitive data. In these contexts a technology host that was previously able to remain agnostic to the application specifics of researchers, now must partition their infrastructure in a complementary manner in order to provide appropriate security assurances.The University of Otago is currently exploring the implications of developing a secure storage capability, including the integrated Rule Oriented Data Systems (iRODS) storage middleware, that aligns with the workflow needs of researchers working with patient data within the healthcare sector. The project aims to achieve a workable model and a set of guidelines for controlling the access, storage, retrieval, replication and analysis of highly sensitive data within a secure environment. Biography – David Eyers Biography – Russell Butson |
| Ryan Fraser
Terry Rankine Josh Vote Lesley Wyborn Ben Evans |
Virtual Geophysics Laboratory (VGL): scientific workflows exploting the Cloud The Virtual Geophysics Laboratory (VGL) is a scientific workflow portal that provides Geophysicists with access to an integrated environment that exploits eResearch tools and Cloud computing technology. The VGL is a collaboration between the CSIRO, Geoscience Australia (GA) and National Computational Infrastructure (NCI) and has been funded by the Federal Government’s Education Investment Funds.The VGL provides scientists with easy agent to exploit multiple technologies provided by eResearch and Cloud in a user driven interface. The VGL was developed in close collaboration with the geophysics user community and, with representatives from GA and ANU, and has been deployed directly into their environment Biography – Ryan Fraser Biography – Josh Vote Biography – Terry Rankine Biography – Ben Evans Biography – Lesley Wyborn |
| Dave Fulker | OPeNDAP Roadmap to New Server-Side Capabilities and Other Supports for Data-Intensive This presentation summarizes the roadmap (parts of which are firm while others are tentative) being charted for the future of Hyrax and OPeNDAP. Four topics will be covered: a) server-side subsetting of non-rectangular meshes, UGRIDs and other classes of non-rectangular meshes as well as unstructured collections of (space-time) point observations such as station data; b) building and utilizing inventories of OPeNDAP-accessible data sets that reflect user-specified constraint expressions and space-time resolutions; c) increased compatibility and commonality between OPeNDAP’s Hyrax and Unidata’s THREDDS Data Server (TDS) based on a newly minted set of OPeNDAP protocol specifications and an associated set of conformance tests; d) the impact of cloud computing on needs for (Hyrax) data services, including potential changes in the social aspects of data exchange, use and reuse. For example, might a new paradigm emerge in which cloud-based processing systems are expected to create provenance and citation records, immediately suitable for publication Biography |
| Wojtek Goscinski
Timur Gureyev Chris Hall Anton Maksimenko Arthur Sakellariou Darren Thompson |
The Multi-Modal Australian ScienceS Imaging and Visualisation Environment (MASSIVE) for Near Realtime CT Reconstruction using XLI at Australian Synchrotron The Multi-modal Australian ScienceS Imaging and Visualisation Environment (MASSIVE) is a specialised Australian high performance computing facility for computational imaging and visualisation. This facility has been formed to provide researchers with the hardware, software and expertise to drive research in the characterization, biomedical science, materials research, engineering, and geoscience communities, and stimulate advanced imaging research that will be exploited across a range of imaging modalities, including synchrotron imaging, neuroimaging, electron microscopy and optical microscopy. This presentation will introduce the MASSIVE project, and present it’s role in a near-realtime Computed Tomography (CT) reconstruction service for the Imaging and Medical Beamline (IMBL) at the Australian Synchrotron, using XLI / X-TRACT software. This service allows researchers using the IMBL to perform fast CT reconstruction and visualisation while in-experiment. We will present XLI CT reconstruction performance results across the MASSIVE platform and early experience using the prototype service. Biography – Wojtek Goscinski |
| Rhys Hawkins
Ben Evans Deborah Mitchell Steven Mceachern |
Visualising spatially-coded data at the Australian Data Archive There has been an increasing need for spatial data information to be made available through web-based tools; which link seemlessly to data repositories. The Australian Data Archive, ADA, (formally the Australian Social Science Data Archive – ASSDA) is one such example of a critical research data repository with a potential for such tools. In this paper we will present our work on the ADA spatial data framework and describe our new online tools for exploring spatial social science data. This new capability has had implications for the entire data workflow for archiving of survey data. From the design of surveys to incorporate the accurate recording of geospatial identifiers, maintaining confidentiality of geo-located respondents information to prevent identification by unauthorised users and allowing researchers access to the data in new and powerful ways. Biography – Rhys Hawkins Biography – Ben Evans Biography – Deborah Mitchell Biography – Steven McEachern |
| Leonie Hellmers | eResearch Survey: first longitudinal report This paper will present findings from the first longitudinal survey investigating eResearch practices and attitudes across the higher education sector. The survey was first rolled out in 2009 across seven NSW universities, with over 1,000 participating researchers. It is about to be repeated and this second round of responses, available in time for the conference, will allow us to observe changes in attitudes and behaviours with respect to eResearch over the last two years. Results from 2009 provided valuable baseline data. Significantly, results pointed to a gap between researchers’ willingness and obvious need to adopt eResearch practices and their limited awareness and utilisation of eResearch and eResearch bodies. This presentation offers findings from the second round of the survey towards three aims: a) tracking movements in researcher technology-enhanced practices, needs and constraints; b) continuing the discussion about the importance of considering these practices when developing research infrastructures and services; and c) monitoring the effectiveness of eResearch support agencies over time. Are we effectively engaging with researchers? Is there a noticeable impact on the uptake of eResearch technologies? What improvements are evident in specific areas such as: research data management, data re-use and research collaboration? Biography |
| Mary Hobson
Andrew Rohl Rob Cook Ian Gibson Bill Applebe Nathan Bindoff |
Australian eResearch Organisation – a Nationwide Collaboration for Research Infrastructure & Services High performance infrastructure and services are essential for the effective conduct of globally significant research in many disciplines. Their impact is spreading rapidly as modeling, simulation, visualisation, information-centric computing, collaboration and other applications of computing open up new possibilities for innovative research. The NRIC (National Research Infrastructure Council) Roadmapping exercise conducted during 2011 is planning the next stages in the national support for what are now essential platforms for advanced research. Access to infrastructure and services is promoted through institutional information technology (ITS) and eResearch services groups and CAUDIT, through the regional eResearch service providers established in each state and through the Commonwealth organisations established under the NCRIS and EIF Super Science programs collectively known as the Platforms for Collaboration. To date these separate entities have collaborated loosely to develop and promote the facilities available to researchers. To enable greater research impact through broad access to high performance infrastructure the eResearch service providers have organized themselves into AeRO. This presentation “launches” AeRO to the national eResearch community, presents its program and opens membership to any organisation involved in the provision and support of eResearch services. Biography – Mary Hobson Biography – Andrew Rohl Biography – Nathan Bindoff Biography – Bill Appelbe Biography – Rob Cook Biography – Ian Gibson |
| Nick Horspool
Jeffrey Johnson Ben Evans |
From the sea to the clouds: TsuDAT, a community-based tsunami simulation application hosted in the cloud TsuDAT, the Tsunami Data Access and Simulation Tool, is a novel new approach to tsunami inundation simulations that farms out computationally intensive tsunami simulations to a computing cloud. TsuDAT consists of a web-based mapping application where users can explore their offshore tsunami hazard and easily build detailed tsunami inundation simulations, and a backend that employs an open-source hydrodynamic modelling application run on virtual machines in the cloud. This approach allows non-modellers, such as emergency managers, planning officials and land-use planners, to create detailed tsunami inundation hazard maps and assess the tsunami risk for coastal communities. The benefit of this approach is that users do not require specialist software or high performance computing resources to be installed locally. In addition the scalability of cloud computing offers increased supply of computational resources as demand increases. TsuDAT aims to transfer the capability of computational tsunami simulations from handful of experts to a much wider community of modelers located in the state and territory governments around Australia. Biography – Nick Horspool Biography – Jeffrey Johnson Biography – Ben Evans |
| John Houghton
Greg Laughlin |
Costs and benefits of data provision Over the last decade there has been increasing awareness of the potential benefits of more open access to Public Sector Information (PSI), research publications and research data both within Australia and around the world. That awareness is based on economic principles and evidence, and it finds expression in policy at organisational, national and international levels. Government and research policies seek to optimise innovation by making publicly funded data available for use and re-use with minimal barriers in the form of cost or convenience. This confers three responsibilities on publicly funded agencies: (i) to arrange stewardship and curation of their data, (ii) to make their data readily discoverable and available for use and re-use with minimal restrictions, and (iii) to forgo fees wherever practical. This paper reports on the findings of a study (currently underway) which: presents case studies examining the costs and benefits involved in making publicly funded data freely available for the agencies and their users; estimates the wider impacts of making publicly funded data available; and draws out lessons for the research sector regarding the curation and open sharing of research data. Biography – John Houghton Biography – Greg Laughlin |
| Andrew Isaac
Sam Morrison |
Karaage – Cluster account management and reporting In just over 12 months, the Victorian Life Science Computation Initiative (VLSCI, http://vlsci.org.au) has gained over 300 users, 80 projects and collected user feedback and usage data for four quarterly reports and one annual report. This has all been done through Karaage (https://code.vpac.org/trac/karaage), an online account management and reporting system, developed at the Victorian Partnership for Advanced Computing (VPAC, http://vpac.org). When a user’s access to the supercomputers has been approved, they are given a Karaage account. This allows users to avoid a major impediment to account management – the paper trail required to perform many account management tasks and to collect reporting information – by delegating account management and responsibility to the most suited people – the users. Users are assigned different levels of privilege. Through the user portal, for example, a nominated project manager can invite and approve new users, complete reports and questionnaires, and track project resource usage. Systems’ administrators need only monitor user management and collate report information. This simplifies the entire account management and reporting process for users and administrators. Karaage simplifies the workflow of account management. Built on the Django web framework, it provides web-based account management facilities to administrators and users. Its design allows users to manage their own accounts and projects. This is achieved by providing modules and middleware to connect the various sub-systems of compute resources and their administration systems. This not only streamlines authentication and authorisation, but also provides interaction with various databases to collect and collate information in a readily accessible manner, all through a single web-portal. Karaage is developed and supported by VPAC and released through the GPL v3 licence. Biography – Andrew Isaac Biography – Sam Morrison |
| Peter Isaac
Calvin Chow Simon Yu Virginia Gutierre |
Transforming Research in Ecosystem Ecosystem research is about the Australian ecosystem dynamics: the role of Australian ecosystems in the cycling of water and carbon between biospheric and atmospheric stores and the response of these ecosystems to changes in these cycles. Effective research is hampered by the lack of coordination in data collection, archiving and quality control from measurement stations across remote Australia that has been implemented independently. Underpinning this initiative was the need for a more collaborative research environment to address global climate challenges. This presentation will review the systems in place that will provide an integrated research data access and facilitate collaborative approach for researchers by addressing the following key principles:• standardisation and automation of the data collection, archival and quality control of measurements from a network of measurement stations;• integration of complementary data streams from different sources into a single data and metadata repository;• facilitation of the linking the data into a common research data space, through the Australian Research Data Commons to encourage re-use of research data. Biography – Peter Isaac Biography – Simon Yu Biography – Virginia Gutierre |
| Edward King
Ben Evans Lesley Wyborn Wenjun Wu Leo Lymburner Medhavy Thankappan Peter Tan Fei Zhang Mark Gray Joseph Anthony Muhammad Atif Matt Paget Stefan Maier Thomas Schroeder |
A National Environmental Satellite Data Virtual Laboratory We have constructed an environment in which different research communities using large remote sensing data sets can coalesce, based on a common platform for data, workflows, and analysis tools in a high-performance environment. Earth Observing (EO) sensors carried on space-borne platforms produce large (multiple TB/year) data sets serving multiple research and application communities. The limited overlap between these end-user groups, together with the data management challenges, often leads to fragmentation of data storage and duplication of processing systems and user analysis environments. Moreover, where overlaps exist, they are often difficult to exploit because of specific implementation differences such as agency network firewalls, incompatible storage formats and the degree of intermediate processing. This problem is common across a number of existing satellite sensors and will only get worse as new sensors are launched in the future. A virtual laboratory is a means by which communities can work together to collectively overcome the problems in common and focus on their specific research interests. This virtual laboratory has been constructed around both the computing power and data-intensive cloud facility at the NCI with the support of both the IMOS and TERN NCRIS capabilities. The result is a scalable platform for collaboration in this data-rich area with far reaching interests in the research community. This development facilitates a long term goal of the remote sensing community; to convert earth observation data into information at the spatial and temporal scales that are relevant to decision makers. Biography – Edward King Biography – Ben Evans Biography – Lesley Wyborn Wenjun Wu, Leo Lymburner, Medhavy Thankappan, Peter Tan and Biography – Mark Gray Joseph Antony and Muhammad Atif are data-intensive computing specialists at the NCI. Biography – Matt Paget Biography – Stefan Maier Biography – Thomas Schroeder |
| Qing Liu
Greg Timms Yanfeng Shu Daniel Smith Andrew Terhorst |
Provenance-Aware Automated Data Quality Control As automated data collection has become more commonplace (e.g. through industrial and environmental sensors and sensor networks), the volume of data produced has risen exponentially. To shared and re-use, it is crucial that automated techniques for the assessment of data quality are also developed. Such techniques have begun to appear in the literature [1, 2] in recent years, combining data statistics and domain expertise to produce data quality flags or estimates of uncertainty. Where the quality of data is assessed by the organisation responsible for collecting the data, these approaches are relatively easy to implement. However, in many circumstances, it is unavoidable that users will have to sometimes use data provided by a third party. Therefore, knowledge of how data quality is assessed plays a critical role for users to decide if data is trustworthy and fit-for-purpose. In this paper we discuss how data provenance can enable proper assessement of an automated quality assessment (QA) process. Biography – Qing Liu Biography – Greg Timms Biography – Yanfeng Shu Biography – Paulo de Souza Biography Daniel Smith Biography – Andrew Terhorst |
| Nicholas May | Implementing eResearch Projects using Agile Development Agile development encompasses a family of methodologies that aim to overcome some problems common to heavyweight, document centric, software development processes. An active area of discussion in e-research relates to the benefits of using agile development with e-research projects. At the e-Research Office of RMIT University we are currently implementing two such projects with an agile process. In this presentation, we discuss why agile should be appropriate to e-research projects. We will describe the agile process that has evolved over the lifetime of the projects, including: the practices, tasks and meetings. And discuss the lessons we have learned in implementing our process. Finally, we will draw some conclusions about if and when it may be appropriate to use an agile process to develop an e-research application. Biography |
| Ann Morgan
Mark Baldock |
Research repository models: can one size fit all? University of South Australia Library has developed a number of research metadata repositories in collaboration with other divisions in the University. This presentation describes the repositories, the model used for each, and how this model has been adapted for different types of research. It also describes future developments at UniSA and how the repository model will be utilised. Biography – Ann Morgan Biography – Mark Baldock |
| John Morrissey | Building Data Management services supporting a multi-disciplinary national research organization Update on CSIRO’s development activities in building Data Management platforms supporting multi-disciplinary research. The presentation will include a discussion about DM architecture, current development activities and a overview of planned activities for the next 2-3 years. Finally CSIRO will be releasing its Review and recommendations for a data and information management strategy for CSIRO’s document as an example of an enterprise planning document for science data management. Biography – John Morrison |
| Sam Moskwa | The Accelerated Computing Initiative CSIRO is one of the largest and most diverse scientific institutions in the world with more than 50 sites throughout Australia and overseas. A 2007 review of high performance scientific computing (HPC) identified that CSIRO should not only offer HPC infrastructure, but should work closely with researchers to improve its uptake across all scientific domains. How best to provide the required training and engage with distributed researchers was unspecified. Impediments to commencing HPC use include both hardware costs and programming expertise. The Advanced Scientific Computing (ASC) group provides access and support to HPC facilities at no direct cost to CSIRO researchers. In response to the review, the ASC developed the Accelerated Computing Initiative, which provides targeted training and programming support via small seed projects, also at no cost to recipients. Through engaging directly with researchers on such projects, HPC uptake has significantly increased and researchers have achieved improved science results. Biography |
| Trina Myers
Jarrod Trevathan Ian Atkinson Rob Cook Jeremy Vanderwal |
The Tropical Data Hub (TDH) – A virtual research environment for tropical science knowledge innovation and discovery The Tropical Data Hub (TDH) as an e-Research initiative to provide a data hosting infrastructure to congregate significant tropical environmental data sets. Tropical regions support some of the world’s most diverse and unique ecosystems. However, these sensitive areas are coming under increased pressures from human activities, which significantly threaten their sustainability into the future. Therefore, a need exists for more informed use of environmental monitoring procedures to help better manage tropical regions. At present data is collected in disjoint repositories and is not visible/accessible for reuse by other lines of enquiry. Without this data being publicised, many opportunities are missed for holistic discovery of major trends that influence tropical ecosystems. The TDH serves as a focal point for amalgamating disparate data sources to facilitate data reuse, integration/searching and knowledge discovery by environmental researchers and government departments. This will provide researchers and planners access to extensive and readily available data that can be used to give a more accurate representation of the state of tropical regions and allow for more suitable environmental management practices to be devised. We present two visualisation tools that model data from the Tropical Data Hub. The first is for assessing land space across Northern Australia and the second is a system to rapidly assess the potential impacts of climate change on global biodiversity.
|
| Peggy Newman
Nigel Ward Hamish Campbell Matthew Watts Craig Franklin Jane Hunter |
OzTrack: Data Management and Analytics Tools for Australian Animal Tracking Studying animal movements is of critical importance when addressing environmental challenges such as invasive species, infectious diseases, climate and land-use change. The number of species tracking projects in Australia is rapidly expanding – due to both the reduction in the cost of tracking devices (radio, acoustic, and satellite) and the need for ecology management communities to study the behaviour of species across taxa, space and time. The high resolution sensor and tracking devices deployed to monitor species typically generate very large datasets which can be difficult to interpret without advanced analytical computing and visualization tools. Much of the animal tracking data collected from within Australian is not analysed or stored in an efficient and systematic manner, and as a direct result data loss and study repetition is common. The aim of the OzTrack project is to develop the critical data management infrastructure needed to support the animal tracking research community. The project is developing three software components: a central repository for the data and metadata being generated; a set of analysis, modeling and visualization services; a Web portal interface that enables scientists to search, retrieve, analyse and visualize the data.
|
| Liam O’Brien | CSIRO eResearch Architecture CSIRO is a large, geographically dispersed research organisation with over six-thousand staff at fifty-six sites. Scientists at CSIRO carry out research in various domains and deal with large complex datasets, with the need to collaborate effectively and efficiently. In response to these issues, geographical separation of staff and increasing data complexity and the need for effective collaboration, an architecture for software and systems that supports eResearch within CSIRO has been developed and is evolving to take advantage of new opportunities to support the work of scientists within the organisation. Several projects have already been completed and several are underway to build the underlying infrastructure and software systems that support eResearch. There are several significant areas that are being addressed within the eResearch Architecture including a Research Data Management Service, support for electronic Laboratory Notebooks, and support for eTools/Scientific Workflow. These systems use underlying infrastructure which includes advanced scientific computing, visualisation and imaging, data storage, networking, collaboration tools and cloud computing. There are several significant challenges to developing an eResearch Architecture for CSIRO which include the diversity of research domains and the needs of the scientists within each domain, the introduction of new technology and approaches within the organisation and the cultural change that is needed in some cases, scalability and usability of the solution architectures for the various systems that are developed and the challenge of integration and interoperability across a diverse set of systems and existing technology that are used by scientists within the organisation. Biography – Liam O’Brien |
| Rebecca Parker
Dana McKay Terrence Bennett |
Lessons for data sharing from institutional repositories Governments and institutions are increasingly interested in promoting open sharing of research data through institutional repositories: showcasing quality research data brings prestige to institutions and gives governments a visible return on financial investment in research and development. While the incentives for open data are clear for institutions and governments, any attempt to create an open data climate depends on the researchers who will choose to share their data (or not). Early attempts to foster an open data movement have met with little interest or action on the part of researchers, a result reminiscent of early attempts to recruit publications to institutional repositories. In this paper we draw on the institutional repositories literature to identify five major barriers to open data sharing: (dis)incentives, difficulty, danger, and existing disciplinary sharing practices. To change practice (and data sharing would be a major change for many disciplines), sufficient incentives must be in place to overcome old habits. There is currently little reward for researchers in data sharing: the risks are high and there are no research metrics available for measuring the impact of shared data. With so little incentive, the barrier to participation must be very low; however data sharing and curation are difficult at best. There are no standard ways to describe data, meaning cataloguing is taxing for both researchers and the repository librarians who would assist them. Not only is data sharing low-benefit and difficult, it is threatening to researchers: it may alienate their participants, and research data could be ill-used or misinterpreted. Finally, those who already share data in their own disciplines are unlikely to be willing to change their practice to meet institutional requirements: it is simply not worth it to them. The institutional repository literature highlights all these problems, and may even provide insight into some solutions. Biography – Rebecca Parker Biography – Dana McKay Biography – Terrence Bennett |
| Kevin Pulo
Ben Evans Deborah Mitchell Steven Mceachern |
Panemalia: visualising longitudinal datasets at the Australian Data Archive Longitudinal surveys are a very rich form of social science data, often containing a wealth of as-yet untapped hidden knowledge. However, such datasets are typically examined using analytic techniques and simple graphs. We believe that much better can be done in the analysis and exploration of such fertile datasets. Panemalia is the application of an advanced visualisation technique to longitudinal survey data. It is a highly interactive DHTML application, integrated with the data repository at ADA, is accessible by non-IT savvy social science users, and supports the requirements of data familiarisation, exploration and quality assurance. Biography – Kevin Pulo Biography – Ben Evans Biography – Deborah Mitchell Biography – Steven McEachern |
| Robyn Rebollo
Michael Haugh Simon Musgrave Xiaobin Shen |
Sustainable Solutions to Intellectual Property and Ethical Complexities in Building a National Corpus A myriad of complicated legal and ethical issues have arisen from the Australian National Corpus (AusNC) project as an outcome of making large amounts of language data available to other researchers and the public, where permissible. This presentation will discuss the steps taken by the AusNC Project to ensure both legal protections and ethical considerations are dealt with for collections intended for inclusion to the AusNC. Biography – Robyn Rebollo Biography – Michael Haugh Biography – Simon Musgrave Biography – Xiaobin Shen |
| Matthias Reumann
Andreas Pflaumer Coeli M Lopes Blake G Fitch Michael C Pitman Changhuan Kim Simon Wail Stephen Moore Ryan Hoefen Arthur J Moss Jin O-Uchi Christian Jons Scott Mcnitt Wojciech ZarebaI lan Goldenberg David Abramson John J Rice |
Clinical Application of Cardiac Modelling: a Need for Supercomputing? Cardiac models are among the most mature biophysical models with research going back to Hodgkin and Huxley’s work on mathematical models of excitable membranes in the 1950s. Since then, the field of has advanced to a stage where clinical application of cardiac modelling can be conceived. Multiscale, multiphysics cardiac models with high degree of detail both on molecular as well as organ level require large computer resources. Having said that, we have recently developed a model to support risk stratification of long QT 1 patiens that does not necessarily require supercomputing resources. This leads to the question whether supercomputing in cardiac modelling is required to have a clinical impact. We find that clinical impact can be achieved with cardiac computer models that do not require the supercomputing capabilities of systems with thousands or tens of thousands of cores. However, when investigating patho-physiological processes on organ level that take place over hours like blockage of a coronary vessel that causes ischemia and infarction, or if electro-mechanical processes are investigated, multiscale, multiphysics models of the heart are required that demand the use of supercomputers to accomplish simulation times that can be integrated in clinical workflows. Biography – Matthias Reumann |
| Matthias Reumann
Kathryn E Holt Michael Inouye Tim Stinear Benjamin W Goudey Gad Abraham Qiao Wang Fan Shi Adam Kowalczyk Adrian Pearce Andrew Isaac Bernie J Pope Helmut Butzkueven John Wagner Stephen Moore Matthew Downton Philip C Church Steve J Turner Judith Field Melissa Southey David Bowtell Daniel Schmidt Enes Makalic Justin Zobel John Hopper Slave Petrovski Terence O’Brien |
Precision Medicine: Dawn of Supercomputing in ‘omics Research People vary greatly in their underlying genetic risks of diseases and in their responses to treatment for these diseases. Unpredictability of treatment outcomes results in significant personal and societal costs. Complex networks of gene regulation, gene-gene and gene-environment interactions have replaced the notion that a single gene is causative of disease or trait. Quantifying personalized risks, devising prevention strategies, and optimizing drug responses are major challenges for the application of Precision Medicine. Many research groups worldwide have been diligently working over the last few decades to develop the epidemiological and clinical resources of biospecimens and from population-based clinic-based, well-characterised large samples of cases, controls, twin pairs and families with the aim to perform analysis using complex modelling in Precision Medicine based on genomic, transcriptomic and proteomic data. However, it is currently impossible to carry out any but simplistic analyses on these large data sets due to lack of computer power and memory and therefore the full utility of the resources and technology has not been realised.. We will discuss current computational challenges in Precision Medicine and propose the use of supercomputing resources to tackle these challenges. In particular, we will present two approaches to whole genome comparison of critical importance but which are currently computationally prohibitive: multiple sequence alignment and the detection of single nucleotide polymorphism (SNP) interactions. Both approaches use the massively parallel, distributed memory supercomputer at the Victorian Life Sciences Computation Initiative (VLSCI). Biography Biography – Dr Reumann |
| Anna Shadbolt
Ann Borda |
Building training into the value proposition of eResearch Advances in technology have accelerated the rate of research outputs. In spite of this, the capacity of research organisations to build and maintain the eResearch infrastructure required to enable researchers to maximise benefits from emerging technologies continues to lag behind the pace of innovation. The importance of the human e-Enabling component of this research infrastructure is well acknowledged and valued, yet institutions continue to struggle to build sustainable programs of eEnablers across their research communities. The Victorian eResearch Strategic Initiative (VeRSI) was established in 2006 and funded by the Victorian Government to accelerate and coordinate the uptake of eResearch in universities, government departments and other research organisations. In 2010, the VeRSI team commenced a review of the role that education, training and outreach (virtual and face-to-face) could play in the enhancement of eResearch outcomes. For VeRSI, outreach and training provision has been mostly opportunistic, usually coinciding with a visit from an international expert or a large regional/national meeting opportunity. This approach has been well received to date, but it is unclear if the impact would be greater with a more targeted implementation strategy. The focus of this paper is our investigation of the feasibility of embedding training and outreach into the project planning and delivery cycle, i.e. a greater coupling between training, outreach, communication planning, and project delivery. Embedding training and communications in project planning and delivery is not new in large-scale transformational technology based projects . eResearch enabling projects generally focus on ‘innovators’ and ‘early adopters’. As with research more broadly, eResearch project delivery is usually highly variable, and depending on the project, the partner requirements and the deliverables in a project, is not always clearly articulated. Leveraging project success for expanded impact could support benefits beyond the life of the project. VeRSI is looking at ways to build in ongoing Partner benefits beyond the life of a project by including outreach and training into the project delivery process. This is a shift from emphasis on ‘product creation’ to ‘value creation’ as the prime focus of project outcomes. Value creation is supported with education, training, and outreach and will be used to enhance the positioning of products and services both within and across Partner institutions. Working with Partners to communicate the value propositions of new technology from the perspective of actual and potential users should enhance uptake of that technology and support business ownership by sustaining the change required to shift the balance in eResearch uptake. In this paper we will provide examples of how these activities are evolving and what we are learning along the way as we take eResearch uptake to the eXtreme. Biography – Anna Shadbolt Biography – Ann Borda |
| Richard Sinnott
Anthony Stell |
A Virtual Research Environment for International Adrenal Cancer Research For many research areas, the need to collaborate across organizational and in certain cases national boundaries is essential. This is especially the case when dealing with rare diseases where a lack of data, information and/or sharing of expertise can cause delays in progressing the understanding and potential diagnosis/treatment of such diseases. Research into adrenal tumours and understanding their different molecular mechanisms and in turn development of targeted personalized treatments is one such area where co-ordination of international cancer efforts is essential. The European Network for the Study of Adrenal Tumours – Structuring clinical research on adrenal cancers in adults (ENS@T-CANCER – www.ensat-cancer.eu) project has been funded by the European Union to establish a state of the art Virtual Research Environment (VRE) supporting all aspects of international research and collaboration into the aetiology, diagnosis and establishing optimal treatment strategies for patients with adrenal cancer. In developing this platform it is essential that access to clinical and biological data (samples) is strictly enforced according to ethical arrangements. This presentation outlines the goals of the ENS@T-CANCERproject and outlines the on-going implementation work. We show how security-oriented information can be collected and tracked through the VRE including supporting collection of clinical data sets and their linkage with associated bio-samples in an ethically-driven framework. We also outline how it is expected that this project will shape many related efforts around the Parkville Precinct where clinical and biological matchmaking services across a range of clinical research areas are to be supported. Biography – Anthony Stell Biography – Richard Sinnott |
| Richard Sinnott | Classifying Data Sharing Models for e-Health Collaborations Seamless access to clinical and biomedical data sets is the cornerstone upon which the vision of e-Health depends. A multitude of projects and initiatives developing e-Health infrastructures providing access to a range of clinical and biomedical data sets have occurred [1-3], however by and large no clear consensus on the best way to build e-Health infrastructures has been established. Rather, different projects and initiatives have typically developed their own software solutions for their own particular needs and recycling of existing systems has been the exception as opposed to the rule. This is not surprising in many respects given the heterogeneity of many existing clinical systems and the rapid evolution taking place across the post-genomic (genomic, proteomics, metabolomics etc) space and the numerous advances in imaging and diagnostic techniques. However it is clear that the future success of the e-Health vision and its translation to personalised medicine, improved healthcare and the many other opportunities identified in the post-genomic age depends upon lessons learnt in developing e-Infrastructures, and ultimately being able to classify and compare solutions. Ideally a common architectural framework would exist by which reference implementations could be compared – much like the OSI protocol stack. However no such overarching framework exists and it would appear that at least for the foreseeable future, e-Health infrastructures are likely to remain largely ad hoc and uncoordinated across different communities in different countries. In this context, establishing best practice and comparing different solutions is non-trivial, as they are typically developed with different scenarios and different communities involved. The aim of this presentation is to structure the discussion of e-Health infrastructures through fundamental architectural data sharing patterns that are at the heart of many kinds of e-Health collaboration. Thus whilst no single common architecture for e-Health infrastructures exists, it is the case that common patterns of data sharing exist to support e-Health collaborations – at least at an architectural/conceptual level, as opposed to lower level implementation patterns as found in the work of Gamma [4]. We identify such patterns and outline their advantages and disadvantages. In describing these patterns we do not focus in detail on the technologies that are used to implement them per se, but rather our focus is on the fundamental nature of the data sharing and collaboration models they support. Each pattern is illustrated with an exemplar project along with the advantages and disadvantages of the pattern itself. It is intended that this classification will help better shape the future e-Health infrastructure discussions and provide the basis for comparison of solutions as well as shed insight to others in developing their own e-Health solutions. Biography – Richard Sinnott |
| Richard Sinnott
Martin Tomko Gerson Galang Robert Stimson |
Towards an e-Infrastructure for Australian Urban Research Australian urban and built environment research covers a multitude of research disciplines investigating social, economical and physical phenomena at a multitude of spatial and temporal scales and across diverse aggregation levels, from individual-level through to cohorts and populations, and across a range of scenarios, e.g. public health, voting patterns, traffic, energy and water. The development of a common software platform (e-Infrastructure) meeting the needs of such research communities must tackle many challenges associated with data intensive areas of research. This includes dealing with data sets from a multitude of federal, state, municipal, academic and private institutions, all of whom hold vast arrays of heterogeneous data. For many researchers these data sets are difficult to discover, access, interrogate and use more generally. It is also unrealistic to expect researchers to always have the technical capability and capacity to handle such large amounts of diverse data, or to develop data processing tools making use of such data sets, or indeed be able to run computationally intensive simulations and models based on these data sets. Islands of expertise and islands (silos) of data currently exist that has fragmented urban research and thwarted a holistic approach to the study of the Australian urban and built environment system. Biography – Martin Tomko Biography – Gerson Galang Biography – Robert Stimson Biography – Richard Sinnott |
| Rod Harris
Jon Smillie Ben Evans |
An archive for optical astronomy To ensure the ongoing availability and use of data products from various optical astronomy projects throughout Australia a national astronomy data archive has been created. This data archive includes dedicated resources for hosting and long-term support of nationally significant datasets and a suite of Virtual Observatory (VO) web services implemented on top of these datasets designed to allow scientists to discover, access, and analyse various observations in a consistent fashion. In this paper we will present the work done in implementing select VO services for the SkyMapper, WiggleZ and GAMA projects Biography – Jon Smillie Biography – Rod Harris Biography – Ben Evans |
| Salim Taleb
Peter Hicks |
Connecting the dots to unify research data and metadata Curtin University is developing a research data management system that aims to provide researchers the necessary tools to plan, create, store, access, share, describe, archive and curate their research data. The components of the research data management system are built on the core principle of information reusability. Each component will create, manage and propogate information that can be utilised or reused by other components of the system. This interconnection provides value that is higher than that of the individual tools. The three major components forming the data management system are the data management planning tool (DMP), the data management layer, notionally labelled Research Data Portal (RDP) and a metadata management system, named the Metadata Hub. The DMP will assist researchers in determining their requirements for data capture, storage, access, reuse, ownership, archival and preservation. The RDP is a data management layer overseeing a number of data storage solutions. The information gathered in the DMP will be utilised by the RDP to initiate data storage for awarded grants, creating a data storage location with default access and security rights, files and folders structures, and default collection level descriptions. The RDP contains an engine that utilizes the DMP information to recommend or create default connection to storage solutions. The Metadata Hub will be an aggregator of information about research data (from RDP), the researchers and their respective projects (from other institutional systems). This design enables domain-specific data capture workflows and systems to be integrated with institutional metadata and data capture channels. Biography – Salim Taleb Biography – Peter Hicks |
| John Taylor | The CSIRO eResearch Strategy: Transforming the way research is done in CSIRO In order to participate in research that is increasingly enabled by ICT infrastructure CSIRO has developed a strategic framework for developing its enterprise wide capabilities in the areas of:• Data Management• Scientific Computing Infrastructure• Advanced collaborative & visualisation environments• Scientific tools and services The CSIRO eResearch technology and infrastructure strategy is improving and transforming the way that research is conducted in CSIRO. In this presentation we will provide an update on the progress of implementing the CSIRO eResearch strategy. Biography |
| Kerry Taylor
Michael Compton Laurent Lefort |
Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Ontology The ecological and agricultural sciences, industrial processes, and consumer gadgets are increasingly relying on live data streams generated by large numbers of heterogeneous sensors to deliver knowledge and services. All the traditional problems of data management and data integration arise in this context of real time data, plus a few more. Semantic technologies are being rapidly adopted for traditional data management and data integration problems, and there are many international research projects now using semantic technologies for sensor network data management. The World Wide Web Consortium (W3C) established an Incubator Group (SSN-XG) in March 2009 to develop ontologies for describing sensors and methods for using those ontologies for annotation, especially in the context of the Open Geospatial Consortium’s (OGC) Sensor Web Enablement standards. The Group completed its work in June 2011 with the publication of the final report, including the SSN OWL 2 ontology, use cases, extensive documentation and several worked examples. We present the ontology and some of the ways it is being used. Biography – Kerry Taylor Biography – Michael Compton Biography – Laurent Lefort |
| Joe Thurbon | Lessons from Intersect’s Engagement Experience Over the last three years Intersect has grown from a single employee to approximately forty. One constant during that period is that every non-trivial project and service we have provided has been based on highly interactive engagement with the research community: approximately one quarter of our staff are dedicated to engagement, and are embedded across our member organisations. Even within that constancy, we have tried many approaches, both strategic and tactical, to engage the research community. We’ve learned many lessons – some confirming our suspicions, some confounding our expectations. This presentation summarises the lessons we’ve learned, with a view to sharing our experience with the wider eResearch community. It will cover issues such as the challenges of being a distributed organisation, the importance of being research-driven vs technology-driven, the need for tailoring the engagement model to the individual needs of members, and other insights we’ve gained to maximise the value we deliver to our members. Biography |
| Conal Tuohy
Abigail Belfrage |
Public Record Office Victoria Crowdsourcing Transcription Project Public Record Office Victoria (PROV), and the Victorian eResearch Strategic Initiative (VeRSI), are collaborating in a pilot software development of a crowdsourcing online transcription platform, through which members of the public can access images of public records, transcribe, tag and geo-locate them. The collaboration represents a melding of the “Gov 2.0″ and “eResearch” strategies of the two organisations, and a utilisation of the emerging cultural activity of crowdsourcing that has the potential to create a valuable public information resource and a rewarding experience for participants. Biography – Abigail Belfrage Biography – Conal Tuohy |
| Paul Walk | Developer Community Supporting Innovation (DevCSI) This presentation will describe all this work, outlining some of its successful outputs and, in particular, demonstrating its relevance to publicly-funded research. The presentation will also outline some plans for the future, and indicate some opportunities for international collaboration – with an open invitation to delegates at the conference to engage with us and explore this potential. Biography – Paul Walk |
| Nigel Ward
Tung-Kai Shyy Syed Irfanullah Friska Dhen Ungkara |
Analysis and Visualisation Tools for Spatially Integrated Social Science The field of Spatially Integrated Social Science (SISS) recognises that much data that the social scientist examines has an associated geographic location (for example, a survey respondent’s location). SISS systems use this geographic information as the basis for both integrating heterogeneous social science data sets and for visualising the results of analyses. However, sourcing data sets, understanding relationships between the data and the geography, and implementing appropriate statistical analysis techniques are all time consuming and highly skilled processes. The UQ SISS System project aims to alleviate this burden from social scientists. The project is developing online tools that allow researchers to quickly access rich Australian socio-spatial datasets (e.g. voting outcomes and census data), conduct statistical modelling and visualize spatial relationships between the results.
|
| BoF’s | |
| Andrew Alexander | Mobility Research Tools Roundtable. “A look at the current activity in the Research Sector with Mobile Devices and what they need to address within the changing research environment”. Mobile devices are increasingly becoming a daily tool for improved productivity. Email, messaging, personal reference, reminders, notes, recording voice, pictures or video and accessing web based applications all from a pocket device. How is the research sector using mobile tools and what are some of the applications mobile tools are providing to assist research outcomes. What is the future role for mobile devices in the research sector and what direction should future development take to meet the sectors needs. The BoF will hear of a number of projects from presenters on specific mobile applications and the value they have delivered in their work. This will be followed by a panel discussion with the audience to examine the likely future direction and sector needs for mobile devices and make recommendations on what should be developed for the future. Biography |
| Kylie Bailin
Joanne Croucher |
Open and shut (or not): Conversations about data, access and openness In this BoF we seek engagement with researchers and eResearch professionals to explore different approaches to having conversations about research data access and reuse. Rather than typifying open data as an all-or-nothing dichotomy, the discussion will be framed around the idea of a ‘continuum of openness’. Key areas to be explored include research communities’ expectations of reciprocity, and the changing expectations of funding agencies and publishers. Another topic for discussion is the current and future roles for libraries, data librarians and eResearch intermediaries in research data management. One of the biggest hurdles in beginning the eResearch discussion with researchers is explaining about this spectrum of open data and quelling fears that all data will have to be completely open. This discussion will look at the complexities involved with supporting researchers and informing them about all the different levels of openness. This BoF will also look at education and training as it relates to open data and building capabilities among both support professionals and researchers. Biography – Kylie Bailin Biography – Joanne Croucher |
| Andrew Cheetham
Andrew Leahy Peter Bugeia |
Starting up in eResearch? How to Hit the Ground Running The session is aimed at providing young research institutions who are incubating an eResearch capability with practical advice on how to get things rolling in the right direction while delivering early value to researchers. It is hoped that individuals from institutions who have already been through the startup period will attend the session to share their experiences. Biography – Andrew Cheetham Biography – Andrew Leahy Biography – Peter Bugeia |
| Anne Cregan
Joe Thurbon Bill Appelbe Peter Blain Ann Borda Graham Chen Luke Edwards Mary Hobson Phil Tannenbaum |
eResearch State Agencies This Birds of a Feather session is for the purpose of sharing approaches and experience between the various Australian state-based eResearch agencies – Intersect, QCIF, VeRSI, VPAC, TPAC, eRSA and iVEC. It provides an opportunity for those on the frontline of eResearch to find out about how the other eResearch agencies are approaching engagement with the research community and national initiatives, and to discuss alternate models for providing eResearch services and products and their pros and cons. The key goal of the BoF is for those doing engagement at the State level to network with and learn from and provide feedback to others engaged in similar activities in other states. Biography – Dr Anne Cregan (Convenor) Biography – Dr Joe Thurbon Biography – Dr Bill Appelbe Biography – Dr Ann Borda Biography – Dr Peter Blain Biography – Dr Graham Chen Biography – Luke Edwards Biography – Mary Hobson Biography – Phil Tannenbaum |
| Glenn Moloney
Steve Manos Tom Fifield Bernard Meade |
The NeCTAR Project and Programs: Clouds, Apps and Virtual Labs Participants in this session are invited to discuss the NeCTAR Project including the four NeCTAR Programs: Virtual Laboratories, eResearch Tools, the Research Cloud and National Server Programs, with a focus on the experience and lessons learned from the first node of the Research Cloud at the University of Melbourne, including: · Deploying the first node of the Research Cloud at the University of Melbourne · Research Applications in the Cloud – developing an ecosystem of cloud apps Biography – Glenn Moloney Biography – Dr Steven Manos Biography – Bernard Meade Biography – Tom Fifield |
| David Fulker
James Gallagher |
OPeNDAP Server-Side Capabilities and Other Supports for Data-Intensive Science This session will afford attendees opportunities to hear about and influence the roadmap being charted for OPeNDAP’s future. A major focus will be on increased server-side functionality in those client-server systems built around the (evolving) DAP protocol. Four BoF segments will cover key areas of advancement: extended forms of server-side subsetting, to fully embrace non-rectangular meshes and so-called unstructured grids (UGRIDs); support for user-specified inventories of OPeNDAP-accessible data sets; increased compatibility and commonality between OPeNDAP’s Hyrax and Unidata’s THREDDS Data Server (TDS); and the impact of cloud computing on needed data services. Attendees will be asked to describe use cases and provide other feedback on the likely utility of the advances being considered. Biography – Dave Fulker Biography – James Gallagher |
| David Groenewegen | ANDS Projects BoF During 2010-11, ANDS has been undertaking projects through its Seeding the Commons, Public Sector Data and Data Capture Programs, in partnership with a large number of Australian universities and research bodies. These projects are designed to improve the management of research data and to encourage the development of the Australian Research Data Commons. This Birds of a Feather session is designed to gather together all of those people who are engaged in ANDS projects, whether in universities or other research organisations or ANDS itself. Others are welcome to join in. Whatever your background, we would like you to come along and tell us about your project, share your experiences and link up with other members of the wider ANDS community. Biography |
| Kerry Kilner
Jonathon Bollen Richard Maltby Deb Verhoeven Ross Harley |
Humanities and Creative Arts eResearch Consortium: A BoF for Practitioners This BoF session focuses on the requirements, aspirations and opportunities for collaboration between research databases containing content relating to the humanities and creative arts sector in Australia. It is designed to be a useful brain-storming event that will enable the identification and articulation of the similarities, differences, overlaps and tensions between a range of research infrastructure initiatives that serve research activities and information provision in the humanities. Biography – Jonathan Bollen Biography – Ross Harley Biography – Kerry Kilner Biography – Richard Maltby Biography – Deb Verhoeven |
| Valerie Maxville
Lyle Winton Markus Buchhorn Sam Searle Anna Shadbolt Belinda Weaver |
eResearch Education and Training eXtreme Research not only pushes the boundaries of technology, it challenges and extends techniques for data collection, analysis and communication. As new infrastructure is made available, we need to update researcher and support staff skills to fully utilise these resources. While the majority of researchers may not be eXtreme, their increasing reliance on technology throughout the research process is pushing the boundaries of education and training. With (near) zero resourcing, we struggle to keep up with the rapid change occuring across all research disciplines. With eResearch becoming mainstream, we face an eXtreme workforce development challenge: reskilling researchers, growing specialised research staff and updating graduate attributes and the academic curriculum. This BoF continues the conversation (2008-2010) on how to address these common issues through collaboration. Biography – Valerie Maxville Biography – Markus Bucchorn Biography – Lyle Winton Biography – Sam Searle Biography – Anna Shadbolt Biography – Belinda Weaver |
| Teula Morgan
Lyle Winton |
User-facing Data Services and Capability Building – Institutional Development (BoF) We invite people to the second Birds-of-a-Feather discussion on user-facing data services and the underlying institutional models for building research data capability. In 2010 we were at an exploratory point in the development of research data services, helped along by the stimulus of external funding and attempts to engage with our research communities around research data. Based on feedback we propose to meet again in 2011, to discuss what we’ve learnt, the service models were implementing, what has worked and what doesn’t, and looking at how we’re building a sustainable capabilities within our institutions. This BoF will consist of a summary of the 2010 discussion, followed by several two minute summaries from people across the eResearch and ANDS community on how eResearch is supported in their institutions, lessons learned, good and bad ideas. Presentations will include eResearch community members from Curtin University, CSIRO, Monash University, Queensland University of Technology, Swinburne University of Technology, University of Queensland and Victoria University, with more expected and all welcome! The short presentations will be again followed by an open discussion. We would like to discuss commonalities and differences in our approaches, and whether models have matured enough to form communities of interest and/or good practice. Biography – Teula Morgan Biography – Dr Lyle Winton |
| Tim Pugh
Ben Evans Lesley Wyborn |
Harmonizing Spatial Data Services for Earth and Environmental Science Applications in Data Clouds and Petascale Computing Spatial information and data service providers are building software service stacks and computing infrastructure for specific community HPC use cases and for requirements such as data staging for analysis, visualization, and modelling, aggregation services, server-side processing, web processing services, and virtual laboratories. The intent of the BoF is to bring together leading spatial information service providers and data producers from a variety of communities to disseminate knowledge about current service architectures, and discuss desired service and data interoperability and features within high performance data and computing environments and to seek feedback from the user community. Representatives from data producers/collectors will discuss their data product formats and services, and interoperability with service providers and users communities. The representatives will range from collectors of large volume remotely sensed data sets to collectors of small scale data sets that store precise measurements of real world phenomena. Of further interest is to identify data production and services that could not be accommodated and why. Representatives from the service providers will discuss their business drivers and community use cases, software stack design, and ICT infrastructure that influenced the design of the software services and requirements for data producers or other service providers. Of further interest is the need to identify use cases that could be not accommodated by the software services, data producers, or service providers, and why. Representatives from the user community will present use cases for accessing data products and services from either service providers or directly from data producers/collectors. Of interest is the user’s current assessment of services and products, and future needs for services and products. Representatives from the service providers will discuss their business drivers and community use cases, software stack design, and ICT infrastructure that influenced the design of the software services and requirements for data producers or other service providers. Of further interest is the need to identify use cases that could be not accommodated by the software services, data producers, or service providers, and why.Representatives from the user community will present use cases for accessing data products and services from either service providers or directly from data producers/collectors. Of interest is the user’s current assessment of services and products, and future needs for services and products. Biography – Tim Pugh Biography – Ben Evans Biography – Lesley Wyborn |
| Richard Sinnott
Martin Tomko |
Australian Urban Research Infrastructure Network This BoF will provide an overview of the $20m EIF SuperScience Australian Urban Research Infrastructure Network (AURIN – www.aurin.org.au) project. It will provide a demonstration of the existing systems and outline how the work is progressing based on feedback from expert groups and the urban and built environment community at large. It will also outline the plans for the future for the work as a whole. Biography – Martin Tomko Biography – Richard Sinnott |
| Joe Thurbon
Anne Cregan Bill Appelbe Paul Coddington Rob Cook Luke Edwards Paola Petrelli Phil Tannenbaum |
Epic Fails in eResearch Those involved in research understand that failure is par for the course. eResearch brings together research and software, applying technology to a rapidly changing constantly evolving landscape. In this challenging environment, the software development process has the capacity for many and varied epic fails, and examples of failed services and infrastructure also abound. The purpose of this BoF is to provide a forum for a free and frank discussion of our, as eResearch organisations, most epic of failures. We will laugh, cry, and most importantly learn from our mistakes. Presenters representing Australian eResearch state-based agencies will each describe a project, service or process that has spectacularly not worked, and share the lessons learned from the failure. As the airline industry illustrates, careful and intense scrutinization of disasters and their underlying causes is extremely fertile ground for identifying problems and issues, and provides a platform for making systematic improvements to standard operations and procedures. The goal of this BoF to learn from one another’s mistakes and have a chance to learn from the insights of others regarding our mistakes. Biography – Dr Joe Thurbon (Convenor) Biography – Dr Anne Cregan (Convenor) Biography – Dr Bill Appelbe Biography – Dr Ann Borda Biography – Dr Paul Coddington Biography – Dr Rob Cook Biography – Luke Edwards Biography – Dr Paola Petrelli Biography – Phil Tannenbaum |
| Belinda Weaver
Nigel Ward Suzanne Morris |
Joining the Dots We invite people to a Birds-of-a-Feather discussion on how best to support research data management within a university. Rather than try to create individual, isolated services, we propose a ‘join the dots’ approach that will build a seamless service consisting of a web of referrals, advice and support across a range of units, and that is underpinned by practicable university policy and procedures. We believe this is the best approach for a large, decentralised research university, with multiple disciplines, but the approach would also suit smaller more centralised, universities. Biography – Belinda Weaver Biography – Dr Nigel Ward Biography – Dr Suzanne Morris |
| Lesley Wyborn
Bryan Heidorn Andrew Treloar |
Dark Data and the Long Tail of Science The increased use of instruments, including sensor networks, is enabling the collection of large volume and increasingly higher resolution scientific data sets. Many of these are collected by airborne or satellite instruments and the resultant data sets constitute proxies of real world phenomena (eg remote sensing satellite images that can proxy for vegetation types). New petascale computational infrastructures enable enhanced capabilities in modeling and simulation of these large volume data sets (Big Science). However, to be of value many of these large volume data sets need to be calibrated by precise measurements of point located sample data. Unfortunately these observational data are small in volume and can be collected by many individual researchers as part of a multitude of sampling campaigns (Small Science).The collection of large volume data sets is usually done by a few specialized, but well funded research teams who have to undertake good data management practices in order to be able to manipulate, share and reuse their data. In contrast, the data from many small science projects is termed ‘dark data’ because it is rarely indexed, stored and described so it can be reused. Often, once the research paper has been written the scientist rarely has the resources or the incentives to ensure that the underpinning data are preserved so that it can either be reused by others and/or aggregated into more significant national scale data sets. Most initiatives to reserve and store data tend to focus more on the large volume; homogeneous file based data sets which can be Petabytes in size. Although expensive on hardware, these large volume data sets are relatively cheap to develop software storage infrastructures that facilitate reuse and repurposing. In contrast, although small science data sets are in the range of Gigabytes, they are expensive to develop effective software data storage infrastructures that enable reuse and repurposing.This BoF will discuss why dark data is increasingly important in the era of the data deluge which is perceived to be dominated by large volume data sets. The BoF will provide a heads up on International and Australian initiatives to deal with the increasingly complex issue of aggregating small scale sample based data sets into homogenous national data assets that can be reused and repurposed for use cases that original collector rarely considered. Biography – Bryan Heidorn Biography – Andrew Treloar Biography – Lesley Wyborn |
