Workshops 2011

Below is a listing of workshops on offer at eResearch Australasia 2011. Click on the titles below to be taken to the abstracts and authors at the bottom of the page.

Full Day Workshops

Federating eXtreme Services: A Skills Development Workshop
Making the Semantic Web Work For Physical Science
Maximizing HPC utilization with PBS Professional®

Half Day Workshops

Building a culture of research data citation
Exploiting VIVO for eResearch Activities
Introduction to Machine Learning
Managing a research project with Heurist version 3
ReDBox and Mint Workshop 1: research data management for your organisation
ReDBox and Mint Workshop 2: technical primer
Using OPeNDAP-enabled Applications and Tools for Accessing Australian Data Repositories
Virtual Research Environment Toolkit for SharePoint

NEW! RDSI DaSh and ReDS Tinman Consultation (No Charge)

Workshop Abstracts

Building a culture of research data citation
Jan Brase, Stuart Hungerford, Karen Visser

The Australian research community has for many years valued the consistent and persistent citation of research publications. A parallel mechanism for the citation of Australian research data sets is now enabled by Cite My Data, a service provided by the Australian National Data Service (ANDS, [1]).

The service is based on the international DataCite [2] initiative which uses the Digital Object Identifier (DOI) standard [3] to support communities to use data citation in the same way that the Crossref initiative [4] supports publication citation. Cite My Data allows Australian researchers to uniquely identify datasets, cite them in research publications and cross-cite from other datasets. The DOI system is a key enabler of automated citation tracking and indexing, which, if applied to data citations, would provide incentive to, and acknowledgement for, “publishing” data.

This workshop is designed for all those with an interest in data citation and how it can be supported by their institutions. The workshop will include:

An overview of data citation in scholarly communication
A practical demonstration of how to use the ANDS DOI service, “Cite My Data” and integrate it into eresearch systems
A discussion of the value of being part of the international DataCite consortium and DOI system
Examples of emerging practice internationally
Opportunities to learn from the experience of early implementers in Australia
Strategies for policy implementation at research organisations and data archives

Presenters will include Jan Brase from DataCite, early adopters of the Cite My Data service and ANDS staff.

Exploiting VIVO for eResearch Activities
Simon Porter, David Cliff, Jared Winton

VIVO is an open source semantic web application originally developed and implemented at Cornell. When installed and populated with researcher interests, activities, and accomplishments, it enables the discovery of research and scholarship across disciplines at that institution and beyond. VIVO supports browsing and a search function which returns faceted results for rapid retrieval of desired information.

VIVO’s core engine can be used for more than just staff profiling, however. VIVO has been used as the core of the ANDS funded metadata store developments at the University of Melbourne, Griffith University, and the Queensland University of Technology. VIVO based metadata store roleouts have now also begun at several other institutions. The VIVO engine itself also offers significant potential for humanities based semantic web knowledge dissemination activities.

This workshop will cover the basics of how to install and configure VIVO, as well as how to use VIVO as an ANDS metadatastore, integrate VIVO metadata with eResearch tools, and how to design data entry workflows.

Knowledge of the Semantic web is not essential for this workshop, although the following text is highly recommended for those interested:

Allemang D, Hendler J, Semantic Web for the Working Ontologist. Effective Modeling in RDFS and OWL

Federating eXtreme Services: A Skills Development Workshop
Terry Smith, Bradley Beddoes, Glenys Kranz

The Australian Access Federation (AAF) was established to meet the needs of the Australian research and university sector to seamlessly and securely access federation services and resources using a single sign-on solution provided by users’ home institutions. So how do we develop and extend applications to get the most out of participating in the AAF and take advantage of the 1 million plus users of the federation?

The Workshop is designed for developers bringing services into the Federation and will provide useful content for new developers and those who may have previously worked with Federated services. The presenters will demonstrate the methods they have developed and experiences they have gained that will make the task of federating your services as simple as possible.

It is an excellent opportunity for all those involved in getting technically connected to share their experiences, to provide feedback and get assistance if necessary. It’s also an opportunity to network with other subscribers of the AAF community.

At the conclusion of the workshop you will:

Have an understanding of the Federation Registry management tool;
Know how to register a new service;
Understand how to install the shibboleth service provider software;
Know how to obtain information about users accessing your service;
Understand how to integrate new and existing applications to the federation using AAF supported integration libraries for popular platforms such as Java/Groovy, PHP, Ruby and .NET;
Have a fully working virtual machine connected to the AAF test federation for future reference.

Participant numbers will be limited to 20 and may have to be limited to no more than one per institution if there is high demand.

A virtualbox (http://www.virtualbox.org) virtual machine based on CentOS 5 will be provided to attendees for use during the workshop. To accommodate the requirement of the VM, attendees must bring a laptop with them to the workshop with the virtualbox system already installed. For windows users an SSH client such as PuTTYwill also be required.

Introduction to Machine Learning
Joe Thurbon

Machine-learning is the automated, data driven discovery of programs or hypotheses that can perform tasks that are otherwise difficult to program. It includes sub-fields for classification, planning and parameter estimation, and has close ties to robotics, optimisation, medical diagnosis and expert systems.

The field is one that is still rapidly evolving, but is well past the point where some of the key techniques are now available in a conveniently packaged form, both in terms of APIs as well as interactive workbenches.

This workshop will introduce some of the fundamental concepts of classification machine learning, data representation and how to design experiments that evaluate machine learning algorithms. Attendees will use one of the standard ML workbenches (WEKA) http://www.cs.waikato.ac.nz/ml/weka/ for experiments. Attendees will leave with two key skills:

How to cast their research problem as a machine learning problem
How to determine an appropriate evaluation technique that is relevant to their research goals

They will also be exposed to and have a chance to experiment with a collection of machine learning algorithms.

Making the Semantic Web Work For Physical Science
Nico Adams, Peter Murray-Rust, Alex Wade

The semantic web is the next phase in the evolution of the world wide web from a web of documents to a web of objects. This, in turn, is triggering profound change in the way in which data and information is currently generated, communicated and exchanged.

The workshop will provide participants with an introduction to the fundamental concepts behind the semantic web and will introduce the technology stack currently used to implement these concepts. It will then go on to demonstrate how the semantic web can be “made to work” for the domain of physical science, by showing

how science data can be semantically enriched and made portable
how semantically rich data can be made computable and therefore machine comprehensible
how semantic data can be authored or obtained from previously unstructured information as well as disseminated.

We will demonstrate this by primarily focussing on scientific documents (articles, theses, reports, etc.), which are a rich source of implicit semantic information. Using text-ming and other techniques we will show how, for example, physical- and geo- scientific information can be extracted with high recall and accuracy. Typical results include physical quantities (e.g. temperature, mass, etc including units and errors), geo-locations, researcher and project identities (organization names, researchers, campaigns, etc.), chemical entities and others. We will also give examples of best practices in repository technology and semantic data management.

At the end of the workshop, attendees will have both a theoretical as well as a practical understanding of the fundamental technology of the semantic web and how these can be used to generate semantically rich documents. Furthermore, the attendees will gain an understanding of how to convert unstructured documents to structured ones and how to publish them in repositories.

Managing a research project with Heurist version 3
Steven Hayes

Heurist v3 is an open source eResearch toolkit designed for flexibility. Heurist v3 combines a whole host of workflows and web presentation methods with a smart, network based data model to provide the path of least resistance for researches wishing to create collaborative workspaces to gather, preserve, enhance and publish their research data.

This workshop will work through all aspects of using Heurist and aims to give participants a good enough introduction to enable them to imediately start using Heurist in their own research after returning from eResearch.

Topics covered will include handling of historical maps, images, text (specifically TEI and annotation), timelines, sharing, tagging and publishing. Participants are encouraged to bring their own example datasets.

Maximizing HPC utilization with PBS Professional®
Dario Dorella, Rajesh Chhabra

With the influx of multi-core technology and GPUs, researchers these days have enormous amount of computing resources provided they understand how to utilize them effectively.

This workshop aims at providing hands on training on the ways to improve the utilization of HPC resources using PBS Professional®. A workload manager or scheduler is a life-line of any HPC machine and an effective tuning and configuration of it can make tremendous difference in the overall performance.

OUTLINE

Introduction to common scheudling policies. This topic will try to bring all participants on the same page with a basic introduction to the type of scheudling policies use for managing HPC resources.
Advanced scheduling policies- advanced farishare, advanced reservations. This topic will explain how to configure advanced scheudling policies and maximize the utilization of resources.
Job Management – placement sets, job arrays, tuning mpi jobs. Covering complex job management scenarios and solutions.
How to utilize GPUs with PBS Professional®. GPUs have began to coexist with HPC resrouces and an effective way to utilize these resources along with CPUs can be very useful and substaintially add to the computing power in hand.
Cloud monitoring. To improve performance of a cluster or a cloud, you need to measure it first!
Introduction to PBS Analytics – a tool for measuring and monitoring the HPC resources. An essential tool for any Cloud setup.

ReDBox and Mint Workshop 1: research data management for your organisation
Vicki Picasso, Peter Sefton, Kai Chen, Duncan Dickinson, Gregory Pendlebury, Dave Huthnance

ReDBox is a research data registry that provides organisations with the ability to describe research data and publish metadata to systems such as Research Data Australia. It has a flexible work-flow system for cataloguing data sets and linking to them wherever they reside, as well as capacity for storing research data. The system was designed with input from repository managers, librarians, research administrators and researchers for ease of use and cease of configuration, The Mint provides name authority services that allow for the correct identification of parties (people and groups), activities and controlled vocabularies, drawing on institutional data sources such as HR and Research Management systems, and linking with global infrastructure such as People Australia. Working together, the ReDBox and Mint software is based in the next generation of institutional repository that fits within a national and international data sharing ecosystem.

This workshop will provide a non-technical overview of the ReDBox and Mint systems and discuss how it can help your organisation keep track of where research data resides, describe it for compliance with the Australian Code for the Responsible Conduct of Research and link it into the Australian research data ecosystem including Research Data Australia. The session will outline operational processes that can help keep track of where research data resides, demonstrate functionality to describe and disseminate rich descriptions of data collections while enabling compliance with the standard required for the Australian National Data Service discovery portal.

Please note: This workshop is designed to precede the ReDBox and Mint Workshop 2

ReDBox and Mint Workshop 2: technical primer
Duncan Dickinson, Gregory Pendlebury, Peter Sefton, Dave Huthnance, Vicki Picasso and Kai Chen

ReDBox is a research data registry that provides organisations with the ability to describe research data and publish metadata to systems such as Research Data Australia. The Mint provides name authority services that allow for the correct identification of parties (people and groups), activities and controlled vocabularies. Working together, the ReDBox and Mint software is based in the next generation of institutional repository that fits within a national and international data sharing ecosystem.

This workshop will provide attendees with an understanding of the architecture and design decisions behind the ReDBox and Mint software. The primary experience will be the installation and configuration of the software on participants’ laptops so that they can return to their workplace with a sound understanding and live demonstration of the system.

Please note: This workshop is designed to follow the ReDBox and Mint Workshop 1

Using OPeNDAP-enabled Applications and Tools for Accessing Australian Data Repositories
Tim Pugh

This half-day workshop will afford attendees an opportunity to discover and access the latest OPeNDAP server systems known as Hyrax and THREDDS in Australian research sites. Workshop segments are organized to encourage participants to interact with these data services, so to understand how the services features are utilized by service providers and users with OPeNDAP-enabled applications and tools.

The workshop will cover four areas, one to discover data services and data products, another to show applicable uses of data services and client applications, another to focus on meta-data discovery and data access/extraction, and a final segment on accessing geospatial and aggregation data services. The presenters seek attendee feedback on the utility of the services and client-side tools, and discussions on future tools and applications.

Virtual Research Environment Toolkit for SharePoint
Alex Wade, Lee Dirks

This workshop will provide an update on the Research Information Centre (RIC) project between the British Library and Microsoft Research, and will demonstrate a set of freely available and open sourced virtua research environment (VRE) toolkits that extend Microsoft Office SharePoint in a number of areas of interest to researcher, research lab directors, data management teams, and others offering services to researchers.

Workshops 2011

Full Day Workshops

Half Day Workshops

Workshop Abstracts

Conference Host

ACKNOWLEDGEMENT OF COUNTRY

Conference Managers