Generalised Platforms for Small Data Research: Lessons from Six Years of FAIMS Mobile

Brian Ballsun-Stanton1, Shawn A Ross2, Adela Sobotkova3

1Macquarie University, Sydney, Australia, brian.ballsun-stanton@mq.edu.au

2Macquarie University, Sydney, Australia, shawn.ross@mq.edu.au

3Macquarie University, Sydney, Australia, adela.sobotkova@mq.edu.au 

Data collection is fundamental to field research from archaeology to environmental science. Scientists, engineers, and technicians are turning to mobile devices such as tablets and smartphones to capture data in the field. Citizen science and community heritage projects allow crowdsourcing of data collection beyond what traditional research teams can accomplish, while educating and involving the public in the scientific endeavour. While some software is available to support general data collection, none provides the functionality or flexibility necessary for environmental and cultural research and monitoring in diverse circumstances. Neither does existing software support the development and sharing of new features and functions to foster communities of practice.

The lack of such software hinders field research. In a recent edition of the journal Science, McNutt et al. argue that ‘field sciences’ like archaeology, geology, and ecology lack transparency and reproducibility, compromising research results [1]. Too often, data sharing in field research amounts to merely ‘data and samples available upon request’ [1], while analytical processes, like the code used to process datasets, are not available at all [2]. Much human-mediated field research suffers from ‘small science’ data problems: diverse and idiosyncratic data, customised methodologies and recording systems, lack of standards, and limited budgets, which together restrict the availability of high-quality, compatible data [3,4]. This problem is exacerbated by new data-intensive field research methods, like geophysics and photogrammetry, that have increased the quantity of data being collected by researchers. The culture of field research has often preferred one-off solutions built for individual projects or organisations, a situation that leads to duplication, under-resourced development efforts, sustainability problems, and unfamiliarity with good practice. It also ‘prioritizes publications, innovation, and insight, which puts data stewardship and reuse far down the list’ [1]. Researchers tend to make do with mass-market software not designed with research in mind and unresponsive to community needs, requiring them to compromise their approaches [5]. As a consequence, field researchers often organise information idiosyncratically, using an ad-hoc mix of hard copy, data fragments in various formats, and bespoke databases [3,4,6,7]. Data then gets trapped in hard-copy archives, local storage, or digital ‘silos’, all of which make data difficult to discover and reuse [8]. Where digital datasets exist, they are often highly variable, of poor quality, and incompatible. These deficiencies not only waste time and effort and slow the publication of field research, but also inhibit reproduction or verification of results, independent analyses of primary data, the application of new techniques to old datasets, and the combination of datasets from multiple studies for large-scale research to address ‘grand challenges’ in field-based disciplines[6,9,1]. Furthermore, in many cases they do not meet international good practice (e.g., FAIR Data Principles, [10]) or the data management and dissemination expectations of funding (e.g., the US National Science Foundation or the ARC itself [11]).

Field researchers have long struggled with the digitisation bottleneck; online data services have long existed, but they remain under-populated because getting findable, accessible, interoperable, and reusable (FAIR) digital data into them is costly and time-consuming. Stocktaking by the US Geological Survey and Bureau of Reclamation indicate that no existing software meet field researchers’ needs [12], while reliable bespoke software is difficult and expensive to create and maintain.

Thus, small-data disciplines in the sciences, social sciences, and humanities are characterised by limited resources, diverse practice, and heterogenous data. Information infrastructure often emerges from – and during – research [4]. Mass-market database software requires time-consuming customisation and fails to meet many research needs, while bespoke software development is costly and unsustainable. What is the solution?

The authors have six years’ experience developing and deploying FAIMS Mobile, a platform that allows researchers to generate custom field data collection software with tailored interfaces, data structures, automation, and other features [13].

Based on this background, we argue that researchers in small-data disciplines deserve fit-for-purpose, research-specific applications, and we discuss the key features of sustainable small-data software, including:

  • ‘Generalised’ architectures, in which the ‘core’ software meets research-specific needs, can be used across many disciplines (facilitating a larger user community and spreading costs), and also allows profound customisation for the diverse data and workflows in our disciplines at a lower cost than bespoke software development.
  • Modular, ‘loosely coupled’ approaches, where independent applications work together, with the expectation that while individual components come and go do to the vagaries of funding and software development, researchers will never be left stranded.
  • Open-source licensing, allowing limited resources from multiple organisations to be pooled, and for software to be passed from one project or organisation to the next, with minimal friction.

We will briefly present a few examples of such software, then discuss how FAIMS Mobile implemented these principles, how it worked in the field, and how we would approach such software today based on lessons learned.

REFERENCES

  1. McNutt M, et al. Liberating field science samples and data. Science. 2016 Mar  4;351(6277):1024–6.
  2. Marwick B. Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation. J Archaeol Method Theory. 2017 June 1;24(2):424–50.
  3. Kansa EC, Bissell A. Web syndication approaches for sharing primary data in ‘small science’ domains. Data Science Journal. 2010;9:42–53.
  4. Borgman C.L.. Big data, little data, no data: scholarship in the networked world. MIT press; 2015.
  5. Sobotkova A, et al. Measure Twice, Cut Once: Cooperative Deployment of a Generalized, Archaeology-Specific Field Data Collection System. In: Averett EW, Gordon JM, Counts DB, (eds.) Mobilizing the Past for a Digital Future: The Potential of Digital Archaeology. The Digital Press @ University of North Dakota; 2016. p. 337–72.
  6. Kintigh K. The Promise and Challenge of Archaeological Data Integration. Am Antiq. 2006;71(3):567–78.
  7. Snow DR, et al. Cybertools and archaeology. Science. 2006;311(5763):958–9.
  8. Blanke T, Hedges M. A Data Research Infrastructure for the Arts and Humanities. In: Managed Grids and Cloud Systems in the Asia-Pacific Research Community. Springer, Boston, MA; 2010. p. 179–91.
  9. Kintigh K, et al. Grand challenges for archaeology. Proc Natl Acad Sci U S A. 2014 Jan 21;111(3):879–80.
  10. Wilkinson MD, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. 2016 Mar 15;3:160018.
  11. NSF. Dissemination and Sharing of Research Results [Internet]. Office of Budget Finance & Award Management. 2004 [cited 2017 Mar 29]. Available from: https://www.nsf.gov/bfa/dias/policy/dmp.jsp
  12. DataApp | Research and Development Office [Internet]. Bureau of Reclamation. 2017 [accessed 2018 Mar 27]. Available from: https://www.usbr.gov/research/challenges/dataapp.html
  13. Ballsun-Stanton B, et al. FAIMS Mobile: Flexible, open-source software for field research. SoftwareX 2018

Biography:

Brian Ballsun-Stanton: https://orcid.org/0000-0003-4932-7912

Shawn Ross:  https://orcid.org/0000-0002-6492-9025

Adela Sobotkova: https://orcid.org/0000-0002-4541-3963

Adela Sobotkova is a Research Fellow at Macquarie University, Sydney. Her research combines archaeology and digital methods to study the long-term history of the Balkans and Black Sea region, with emphasis on the evolution of social complexity.

Dr Sobotkova is a landscape archaeologist who studies past settlement patterns in their environmental context, with special focus on the rise and decline of social complexity and human-environment interactions. Much of her research  involves aggregation of datasets for large-scale synthetic studies. Dr Sobotkova is an advocate of reproducible workflows and deep digital practice in archaeology; her forte is open-source mobile field recording, data management, and regional remote sensing for cultural heritage monitoring.

Opening the Research Data Floodgates

Brett Rosolen1, Chris Myers2, David Wilde3, Peter Elford4

1AARNET, Sydney, Australia, Brett.Rosolen@aarnet.edu.au

2AARNET, Melbourne, Australia, Chris.Myers@aarnet.edu.au

3AARNET, Melbourne, Australia, David.Wilde@aarnet.edu.au

4AARNET, Canberra, Australia, Peter.Elford@aarnet.edu.au

 

The internet is there to share … or is it??

Data intensive research has the potential to generate an incredibly large volume of data. Some of the largest datasets on the planet are in the realm of 100’s of Petabytes! Moving this data for analysis on high performance compute, or simply to collaborate on it, requires high capacity frictionless networks that can perform over very long distances. Large single flows (dubbed “elephant flows”) often dwarf all other traffic and have the potential to consume all available bandwidth.

How can we cater for research that needs almost exclusive use of the network at times without causing a detrimental impact on other users? How do we ensure that the “business as usual” functions of the many are as reliably served as the intensive demands of the few?

Much has already been done to date to enhance Australia’s National Research and Education Network, AARNet, and make this frictionless networking a reality. The national backbone now operates at 100Gbps, with multiple 100Gbps services in place across the Pacific. 100Gbps connections architected for Science at the campus edge (called “Science DMZ”) have been deployed at various sites to improve data transfer while preserving site network security. A suite of tools for data sharing, including Cloudstor and Filesender, have contributed significantly to the daily workflows of the wider research community.

AARNet has been able to demonstrate that a little effort put into border architecture, data transfer tools and data handling workflows can result in extremely large data flows between research infrastructure services and instruments, without impacting on a broader range of users of the entire national network. AARNet is continuing to build new network infrastructure and new platforms and technologies to address this challenge, and to support extreme, unique and evolving research institution requirements.

This talk will identify the science research drivers and subsequent design approach of the network and services, and how the research community can use it to freely move data for better science research outcomes.


Biography:

Brett’s role as Senior Research Analyst is to ensure that the research community learns to expect more from the significant long term investment in the network and services provided by AARNet, and to understand that research networks worldwide are architected to serve even the most data intensive sciences and collaborations

Sensitive Data – How do you do yours?

Frankie Stevens1, Jeff Christiansen2, Kate Le May3, Steve McEachern4, Angela Gackle5, Ingrid Mason1

1AARNet, Sydney, Australia, frankie.stevens@aarnet.edu.au, Ingrid.mason@aarnet.edu.au

2med.data.edu.au and QCIF, jeff.christiansen@qcif.edu.au

3Australian Research Data Commons, Canberra, Australia, kate.lemay@ands.org.au

4ADA, steven.mceachern@anu.edu.au

5TERN, a.gackle@uq.edu.au

 

Sensitive data are data relating to people, animal or plant species, data generated or used under a restrictive commercial research funding agreement, and data likely to have significant negative public and/or personal impact if released. Major, familiar categories of sensitive data are: data concerning human participants, data relating to species of plants or animals and commercially sensitive data. Most research institutions will have some form of sensitive data, yet there is no commonly adopted process, policy or storage architecture employed across institutions in Australia when it comes to sensitive data.

The legal framework around sensitive data in is complex and differs within and between nations. Different pieces of legislations regulate the collection, use, disclosure and handling of sensitive data, and there are also many ethical considerations around the management and sharing of sensitive data, in addition to funding body compliance elements on sensitive data. Together, these present a confusing landscape for researchers wanting to work collaboratively with sensitive data, keep it safe, make it FAIR, and perhaps enable its reusability.

This 60 Minute BoF will be facilitated by AARNet as a national provider of research data storage and data movement technologies. The BoF will feature a small number of guest speakers representing the broad perspective of sensitive data (medical, cultural, species,…), who will briefly (3-5 min) share with delegates what approaches and infrastructure they employ for their sensitive data needs to enable collaboration, security, FAIRness and reusability.

Participants in the BoF will be able to engage with live Q&As using Direct Poll to contribute to, and guide discussions within the BoF, all visible through live charts. Using this technology, the BoF will determine what the common challenges are when dealing with sensitive data, and what potential solutions might address these. The BoF will also present opportunities to guide a national strategic approach to the management of sensitive data.


Biography:

The authors represent expertise in sensitive data that ranges from the Medical/Health disciplines, through to Cultural and Ecology perspectives.

Dr Frankie Stevens and Ingrid Mason currently hold roles with AARNet, the Australian Research and Education network, and have an extensive number of years of eResearch experience between them.

Jeff Christiansen of QCIF is an expert on the legislative considerations surrounding sensitive data, having authored the excellent discussion paper on the topic in Med.data.gov.au.

Kate LeMay is a Senior Research Data Specialist with the ARDC, and has a keen interest in sensitive data, particularly with respect to ethical considerations.

Steve McEachern is Director of the Australian Data Archive, which provides a national service for the collection and preservation of digital data relating to social, political and economic affairs.

Angela Gackle from TERN brings representation on the sensitivities of Ecological data, where threatened animal and plant species might be at risk

Launching DataCrate v1.0: a general purpose data packaging format for research data distribution and web-display

Peter Sefton1, Michael Lynch2 Liz Stokes3 Gerry Devine4

1University of Technology Sydney, peter.sefton@uts.edu.au

2University of Technology Sydney, michael.lynch@uts.edu.au

3University of Technology Sydney, elizabeth.stokes@uts.edu.au

4Western Sydney University, g.devine@westernsydney.edu.au

 

At eResearch Australasia 2017 we presented an early version of a standard for packaging research data for distribution as data sets and for hosting on web sites known as DataCrate [1]. DataCrate builds on existing standards for packaging data (Bagit [2]) and metadata (Linked data via JSON-LD, similar to a method described by Wang et al ([3]), discovery metadata from schema.org and the SPAR ontologies [4] with an further innovation: each package contains a HTML catalog that can be used to describe content down to the file level (and beyond, in future to file contents such data dictionaries for tabular data file) building on earlier work at Western Sydney University [5].

The system is discipline agnostic, providing core metadata that can be used for data-set discovery, and for generating

DataCite citations, while being easily extensible to deal with discipline specific metadata.

In this presentation we will launch version 1.0 of the DataCrate standard. The presentation will cover:

  • The motivation for this work, and prior art – why we needed to bring together the standards we did in the way that we did.
  • A walk-through of example data crates from a variety of sources, speleology, clinical trials, simulation, social history, environmental science and microbiology.
  • An introduction to tools for making data crates with an appeal to attendees to join us in making more tools, for more new kinds of data.
  • A demonstration of  how  DataCrates are being used at UTS  to move data though the research lifecycle – archiving and publishing data.

MOTIVATION

DataCrate is a standard that enables researchers to apply FAIR data principles [6] to how they manage research data (given tools to do so, which we discuss below).. As such it demonstrates an affordance that is currently lacking yet to be developed in institutional data management infrastructure. Eclipsing abstract exhortations for research data management, DataCrate  enables good data management practice by providing data documentation that is human readable and comprehensible by the end user as well as enabling automated processes to support key processes in archiving and publishing research data.

TOOLS

HIEv  DataCrate  –  At  the  Hawkesbury Institute for the Environment at Western Sydney University, a bespoke data capture application (HIEv) harvests a wide range of environmental data (and associated file level metadata) from both automated sensor networks and analysed datasets generated by researchers. Leveraging built-in APIs within the HIEv a new  packaging  function  has  been developed, allowing for selected datasets to be identified and packaged in the DataCrate  standard,  complete  with  metadata  automatically  exported  from  the  HIEv metadata holdings into the JSON-LD format.  Going forward this will allow datasets within HIEv to be published regularly and in an automated fashion, in a format that will increase their potential for reuse.

Calcytejs is a command line tool for packaging data into DataCrate developed at the University of Technology Sydney which allows researchers to describe any data set via the use of spreadsheets which the tool auto-creates in a directory tree.

Omeka DataCrate Tools is a collection proof of concept tool for exporting data from Omeka Classic repositories into the

DataCrate format written in Python.

A tool in development for exporting DataCrates from the Omero microscopy repository will also be presented.

DATACRATES IN THE RESEARCH LIFECYCLE

At the  University  of  Technology Sydney, the Provisioner is an open framework for integrating good research data management practices into everyday research workflows. It uses DataCrates as a flexible interchange format to move datasets between diverse research apps such as lab notebooks, code repositories (where data is included by-reference), survey tools, collection management tools, and into archival and publication workflows. Examples of DataCrates moving through the research lifecycle will be provided.

REFERENCES

  1. Sefton, P. DataCrate: Formalising ways of packaging research data for re-use and dissemination, Presentation, eResearch Australasia 2017,

https://conference.eresearch.edu.au/2017/08/datacrate-formalising-ways-of-packaging-research-data-for-re- use-and-dissemination/, accessed 22 June 2018.

  1. Kunze, John, Andy Boyko, Brian Vargas, Liz Madden, and Justin Littman. “The BagIt File Packaging Format

(V0.97).” Accessed March 1, 2013. http://tools.ietf.org/html/draft-kunze-bagit-06.

  1. Wang, Jingbo, Amir Aryani, Lesley Wyborn, and Ben Evans. “Providing Research Graph Data in JSON-LD Using

Schema.Org.” In Proceedings of the 26th International Conference on World Wide Web Companion,

1213–1218. WWW ’17 Companion. Republic and Canton of Geneva, Switzerland: International World Wide

Web Conferences Steering Committee, 2017. https://doi.org/10.1145/3041021.3053052.

  1. Peroni, S., Shotton, D. (2018). The SPAR Ontologies. To appear in Proceedings of the 17th International

Semantic Web Conference. https://w3id.org/spar/article/spar-iswc2018/

  1. Sefton, Peter, Peter Bugeia, and Vicki Picasso. “Pick, Package and Publish Research Data: Cr8it and Of The

Web.” In EResearch Australasia 2014. Melbourne, 2014.

http://eresearchau.files.wordpress.com/2014/07/eresau2014_submission_30.pdf.

  1. Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak,

Niklas Blomberg, et al. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.”

Scientific Data 3 (March 15, 2016): 160018.


Biography:

Peter Sefton is the Manager, eResearch Support at the University of Technology, Sydney (UTS).

At UTS Peter is leading a team which is working with key stakeholders to implement university-wide eResearch infrastructure, including an institutional data repository, as well as collaborating widely with research communities at the institution on specific research challenges. His research interests include repositories, digital libraries, and the use of The Web in scholarly communication.

Mike Lynch is an eResearch Analyst in the eResearch Support Group at UTS. His work involves solution design, information architecture and software development supporting research data management. His other interests include data visualisation and functional programming languages.

From Scarcity to Abundance: Reciprocity and its Rewards

Dr Katherine Bode1, Ms Victoria Riddell2

1ANU College of Arts and Social Sciences, Canberra, Australia, Katherine.bode@anu.edu.au

2Trove, National Library of Australia, Canberra, Australia, vriddell@nla.gov.au

 

Data sharing is a significant challenge for eResearch, perhaps especially in humanities fields where there might be little or no tradition of collaboration or data-based research. At the same time, sharing data has significant benefits, including enabling others to verify results or to ask new questions of existing data, making the results of publicly funded research available, and advancing research and innovation (Borgman 2012).

The National Library of Australia’s Trove service (www.trove.nla.gov.au) is a world-leading demonstration of both the possibilities and benefits of data sharing. Trove enables search and discovery across Australian social, cultural and historical collections by harvesting collection information from partner organisations. These partners range from major national and state-based institutions including galleries, libraries, archives, museums, universities and government departments through to small, local and independent clubs and societies.

Over the past decade, the Trove team at the National Library of Australia have overcome key challenges associated with aggregating social and cultural data, including the significant one of making large, constantly growing and evolving datasets accessible and usable for the needs of diverse communities and stakeholders (Appleford et al 2014).

Trove’s data sharing practices have typically moved in a single direction: either data is collected or it is made available. Accordingly, content is sourced from providers and integrated into Trove; then that content is made accessible for researchers and members of the public to collect through a public Application Programming Interface (API). However, the possibility data sharing to work in more than one direction is an area that the Trove team has begun to realise.

The intention is to support 360º research – or a virtuous data circle – wherein content is utilised and enriched by the connections Trove enables, and then returned back to Trove to further enrich the data available for all. This process aims to create a positive feedback loop, based on the principles of open data and open innovation (Chesbrough 2011), which continually fosters discovery of new knowledge and research opportunities (Zahra 2008).

Trove’s relationship with ‘To be continued: The Australian Newspaper Fiction Database’ is an example of how this virtuous data circle has been achieved. ‘To be continued’ mines the extensive collection of digitised, full text, historical Australian newspapers available through Trove to identify thousands of works of fiction that circulated in Australia in the nineteenth and early twentieth centuries, including multiple Australian works not previously recorded. Trove then harvests records from ‘To be continued’ to create a richer understanding of the contents of these newspapers and to add new editions of existing literary works, as well as entirely new recordings of literary works, to the Trove data corpus. Trove also harvests ongoing crowdsourced contributions to correcting, enhancing, and extending the record of fiction in the ‘To be continued’ database, thus continuing the virtuous data circle existing between Trove and this research project.

This paper describes this collaboration and the multiple communities and opportunities that have eventuated. It also looks at how Trove enables researchers to explore social, cultural and historical collections in a reliable, extensive and systematic way.

References

Appleford, Simon, James R. Bottum, and Jason Bennett Thatcher. “Understanding the social web: towards defining an interdisciplinary research agenda for information systems.” ACM SIGMIS Database: the DATABASE for Advances in Information Systems 45.1 (2014): 29-37.

Borgman, Christine L. “The conundrum of sharing research data.” Journal of the Association for Information Science and Technology 63.6 (2012): 1059-1078.

Chesbrough, Henry W. “The era of open innovation.” MIT Sloan Management Review

Zahra, Shaker A. “The virtuous cycle of discovery and creation of entrepreneurial opportunities.” Strategic Entrepreneurship Journal 2.3 (2008): 243-257.


Biographies:

Katherine Bode is Associate Professor of Literary and Textual Studies at the Australian National University. In 2018, she began a Future Fellowship entitled ‘Reading at the Interface: Literatures, Cultures, Technologies’. Her publications include Reading by Numbers: Recalibrating the Literary Field (2012) and A World of Fiction: Digital Collections and the Future of Literary History (2018).

Victoria Riddell is part of a small team dedicated to all things data, discovery and delivery for Trove (trove.nla.gov.au).

She provides advice on Trove’s extensive range of harvesting mechanisms, metadata schemas, data mapping and data sharing protocols.

Getting onboard with CloudStor

Mr Chris Myers1

1AARNet, Carlton South, Australia, Chris Myers@aarnet.edu.au

DESCRIPTION

Cloudstor is an AARNet developed and supported service that enables AARNet customers and the wider community to quickly and securely sync, share and store files using the high-speed AARNet network.

We will present and demonstrate Cloudstors ability to support and interact with your scientific workflows to help improve outcome delivery, tools to help institutions support Cloudstor users and the provisioning of services. And how Cloudstor works and onboarding.

1. Session 1

a. service overview

i. How can Cloudstor support your current and future research data storage requirements.

b. Roadmap

i. New feature to be released over the next 12 months

c. What features are available

i. Cloudstor Filesender

1. sending files

2. receiving files

3. file encryption

ii. Cloudstor Owncloud

1. Synch and share desktop client

2. WEBDAV connections

3. MOBILE client

4. Sharing files with other Cloudstor users

5. Sharing files with external parties

6. Receiving files from external parties

iii. Rocket use

iv. Cloudstor S3 gateway

1. Sending and receiving

2. 3rdparty integration

2. Session 2

a. Tenant Portal

i. Provision institutes

1. Group drives

2. Collaborators

3. Users

ii. Support clients

iii. Monitor Usage

iv. Generate reporting

3. Session 3

a. Cloudstor and Workflow

i. Workflow discovery session

ii. Solution discussion

iii. Best practice overview

b. Onboarding

i. How we can help

ii. Support channels

iii. Support workflow

iv. What you need to know

v. Discussion on what we need to do to improve your experiences

WHO SHOULD ATTEND

Anyone who uses Cloudstor, Research support staff, IT support staff and researchers.

WHAT TO BRING

Laptop of any flavour.


BIOGRAPHY

Chris Myers is a data solutions specialist at AARNet helping support our members and clients who work in data intensive research environment and equip them with the tools and knowledge required to leverage the network, compute and storage investments and enable collaborative activities to accelerate research outcomes.

Chris is the product manager for Cloudstor

Hivebench helps life scientists unlock the full potential of their research data

Mrs Elena Zudilova-Seinstra1, Mr Julien Therier1

1Elsevier, Amsterdam, The Netherlands

 

Title Hivebench helps life scientists unlock the full potential of their research data
Synopsis By integrating Hivebench ELN with an institutional repository, or the free data repository Mendeley Data, you can maximize the potential of your research data (see diagram below) and secure its long-term archiving. Hivebench supports compliance with data mandates and the storage of research process details, making results more transparent, reproducible and easier to store and share.

Indeed storing information in private files or paper notebooks poses challenges, not only for individual life scientists, but for their lab as a whole. An Electronic Lab Notebook stores research data in a well-structured format for ease of reuse, and simplifies the process of sharing and preserving information. It also structures workflows and protocols to improve experiment reproducibility.

Format of demonstration Live Demonstration
Presenter(s) Elena Zudilova-Seinstra, PhD

Sr. Product Manager Research Data

Elsevier RDM, Research Products

Target research community Whatever your role in the lab – researcher, PI, lab manager.
Statement of Research Impact Hivebench’s comprehensive, consistent and structured data capture provides a simple and safe way to manage and preserve protocols and research data.
Request to schedule alongside particular conference session Optional – List relevant conference sessions if any
Any special requirements Access to Internet connection.

 


Biography:

I’m a Senior Product Manager for Research Data at Elsevier. In my current role I focus on delivering tools for sharing and reuse of research data. Since 2014 I have being responsible for the Elsevier’s Research Elements Program focusing on innovative article formats for publishing data, software and other elements of the research cycle. Before joining Elsevier, I worked at the University of Amsterdam, SARA Computing and Networking Services and Corning Inc.

Breathing new life into old collections – Using citizen Science to revitalising Geoscience Australia Microscope Slide Based collections

Mr John Pring1, Dr Richard Blewett1, Mr Billie Poignand1, Mr Oliver Raymond1, Dr David Champion1, Ms Irina Bastrakova1, Mr Neal Evans1, Mr Peter Butler1, Dr Alastair Stewart1

1Geoscience Australia, Canberra, Australia,john.pring@ga.gov.aurichard.blewett@ga.gov.au, billie.poignand@ga.gov.au, oliver.raymond@ga.gov.au,  david.champion@ga.gov.au, irina.bastrakova@ga.gov.au, neal.evans@ga.gov.au, peter.butler@ga.gov.aualastair.stewart@ga.gov.au

 

DESCRIPTION

Since soon after the federation of Australia in 1901 Geoscience Australia, and its predecessors organisations, have gathered a significant collection of microscope slide based items (including: thin sections of rock and micro fossils) from across Australia, Antarctica, Papua New Guinea, the Asia Pacific region and beyond. The samples from which the microscope slides were produced have been gathered via extensive geological mapping programs, work conducted for major Commonwealth building initiatives such as the Snowy Mountain Scheme and science expeditions. The cost of recreating this collection, if at all possible, would be measured in the $100Ms (AUS) even assuming that it was still possible to source the relevant samples.

While access to these microscope slides is open to industry, educational institutions and the public it has not been easy to locate specific slides due to the management system. The management of this collection was based largely on an aged card catalogue and ledger system. The fragmented nature of the management system with the increasing potential for the deterioration of physical media and the loss of access to even some of the original contributors meant that rescue work was (and still is) needed urgently.

Achieving progress on making the microscope slides discoverable and accessible in the current fiscally constrained environment dictated a departure from what might be considered a traditional approach to the project and saw the extensive use of a citizen science approach through the use of DigiVol and reference to a small number of onsite volunteers.

Through the use of a citizen science approach the proof of concept project has seen the transcription of some 35,000 sample metadata and data records (2.5 times our current electronic holdings) from a variety of hardcopy sources by a diverse group of volunteers. The availability of this data has allowed for the electronic discovery of both the microscope slides and their parent samples, and will hopefully lead to a greater utilisation of this valuable resource and enable new geoscientific insights from old resources.

One of the other benefits of the use of Digivol has been increasing Geoscience Australia’s positive exposure to a totally new section of the general public.  It has highlighted the role of the agency to an audience that had previously had little or no involvement with the geosciences.

REFERENCES

  1. DigiVol citizen science transcription site available from https://volunteer.ala.org.au/ accessed 1 August 2017
  2. Geoscience Australia eCat Record http://pid.geoscience.gov.au/dataset/112965, created 28 Aug 2017

Biography:

John Pring holds a Masters of Management Studies (Project Management/Technology and Equipment) from the University of New South Wales and an Electrical Engineering Degree from the University of Southern Queensland.

He has been Senior Project Manager within the Environmental Geoscience Division of Geoscience Australia for some 10 years and has run a number of projects associated with the management of the agencies data and physical collections over that time.

He has held similar roles within other government agencies prior to joining Geoscience Australia.

Breathing new life into old collections – Using citizen science to revitalise Geoscience Australia’s Microscope Slide Based collections

Mr John Pring1, Dr Richard Blewett1, Mr Billie Poignand1, Mr Oliver Raymond1, Dr David Champion1, Ms Irina Bastrakova1, Mr Neal Evans1, Mr Peter Butler1, Dr Alastair Stewart1

1Geoscience Australia, Canberra, Australia, john.pring@ga.gov.au, richard.blewett@ga.gov.au, billie.poignand@ga.gov.au, oliver.raymond@ga.gov.audavid.champion@ga.gov.au, irina.bastrakova@ga.gov.auneal.evans@ga.gov.aupeter.butler@ga.gov.au, alastair.stewart@ga.gov.au

DESCRIPTION

Since soon after the federation of Australia in 1901 Geoscience Australia, and its predecessors organisations, have gathered a significant collection of microscope slide based items (including: thin sections of rock and micro fossils) from across Australia, Antarctica, Papua New Guinea, the Asia Pacific region and beyond. The samples from which the microscope slides were produced have been gathered via extensive geological mapping programs, work conducted for major Commonwealth building initiatives such as the Snowy Mountain Scheme and science expeditions. The cost of recreating this collection, if at all possible, would be measured in the $100Ms (AUS) even assuming that it was still possible to source the relevant samples.

While access to these microscope slides is open to industry, educational institutions and the public it has not been easy to locate specific slides due to the management system. The management of this collection was based largely on an aged card catalogue and ledger system. The fragmented nature of the management system with the increasing potential for the deterioration of physical media and the loss of access to even some of the original contributors meant that rescue work was (and still is) needed urgently.

Achieving progress on making the microscope slides discoverable and accessible in the current fiscally constrained environment dictated a departure from what might be considered a traditional approach to the project and saw the extensive use of a citizen science approach through the use of DigiVol and reference to a small number of onsite volunteers.

Through the use of a citizen science approach the proof of concept project has seen the transcription of some 35,000 sample metadata and data records (2.5 times our current electronic holdings) from a variety of hardcopy sources by a diverse group of volunteers. The availability of this data has allowed for the electronic discovery of both the microscope slides and their parent samples, and will hopefully lead to a greater utilisation of this valuable resource and enable new geoscientific insights from old resources.

One of the other benefits of the use of Digivol has been increasing Geoscience Australia’s positive exposure to a totally new section of the general public.  It has highlighted the role of the agency to an audience that had previously had little or no involvement with the geosciences.

REFERENCES

  1. DigiVol citizen science transcription site available from https://volunteer.ala.org.au/ accessed 1 August 2017
  2. Geoscience Australia eCat Record http://pid.geoscience.gov.au/dataset/112965, created 28 Aug 2017

Biography:

John Pring holds a Masters of Management Studies (Project Management/Technology and Equipment) from the University of New South Wales and an Electrical Engineering Degree from the University of Southern Queensland.

He has been Senior Project Manager within the Environmental Geoscience Division of Geoscience Australia for some 10 years and has run a number of projects associated with the management of the agencies data and physical collections over that time.

He has held similar roles within other government agencies prior to joining Geoscience Australia.

Field Acquired Information Management Systems Project: FAIMS Mobile, a customisable platform for data collection during field research

A/Prof. Shawn Ross1, Dr Adela Sobotkova1, Dr Brian Ballsun-Stanton

1Macquarie University, Sydney, Australia

Title Field Acquired Information Management Systems Project: FAIMS Mobile, a customisable platform for data collection during field research
Synopsis FAIMS Mobile is open-source, customisable software designed specifically to support field research across many domains. It allows offline collection of structured, text, multimedia, and geospatial data on multiple Android devices, and is built around an append-only datastore that provides complete version histories. It includes customisable export to existing databases or in standard formats. Finally, it is designed for rapid prototyping using and easy redeployability to reduce the costs of implementation. Developed for ‘small data’ disciplines, FAIMS Mobile is designed to collect heterogenous data of various types (structured, free text, geospatial, multimedia) produced by arbitrary methodologies. Customised by an XML-based domain specific language, it supports project-specific data models, user interfaces, and workflows, while also addressing problems shared across field-based projects, such as provision of a mobile GIS, data validation, delivery of contextual help, and automated synchronisation across multiple devices in a network-degraded environment. Finally, it promotes synthetic research and improves transparency and reproducibility through the production of comprehensive datasets that can be mapped to vocabularies or ontologies as they are created.
Format of demonstration Slides / screenshots
Presenter(s) A/Prof Shawn A Ross, Director of Data Science and eResearch, Macquarie University and Co-Director, FAIMS Project.

Dr Adela Sobotkova, Research Associate, Department of Ancient History, Macquarie University and Co-Director, FAIMS Project.

Dr Brian Ballsun-Stanton, Research Associate, Department of Ancient History, Macquarie University and Technical Director, FAIMS Project.

Target research community Researchers in fieldwork disciplines where people (rather than automated sensors) collect data, e.g., archaeology, biology, ecology, geosciences, linguistics, oral history, etc.
Statement of Research Impact FAIMS Mobile has changed users’ daily practice. Case studies indicate that users benefit from the increased efficiency of fieldwork (the time saved by avoiding digitisation more than offsets the time required to implement the system). Born-digital data avoided problems with delayed digitisation, which often occurred long after field recording when the context of records had been forgotten. Researchers reported more complete, consistent, and granular data, and that information could be exchanged more quickly between field researchers and lab specialists, facilitating the evaluation of patterns for meaning. They also observed that the process of moving from paper to digital required comprehensive reviews of field practice, during which knowledge implicit in existing systems to become explicit and data was modelled carefully for the first time.
Request to schedule alongside particular conference session  
Any special requirements Nothing special.

Biography:

Shawn A Ross (Ph.D. University of Washington, 2001) is Associate Professor of History and Archaeology and the Director of Data Science and eResearch at Macquarie University.  A/Prof Rossʼs research interests include the history and archaeology of pre-Classical Greece, oral tradition as history (especially Homer and Hesiod), the archaeology of the Balkans (especially Thrace), Greece in its wider Mediterranean and Balkan context, and the application of information technology to research. Since 2009, the focus of A/Prof Rossʼs work has been fundamental archaeological research in Bulgaria. He is a Research Associate at the American Research Center in Sofia, Bulgaria, and supervises the Tundzha Regional Archaeological Project (http://www.tundzha.org), a large-scale archaeological survey and palaeoenvironmental study in central and southeast Bulgaria. Since 2012 A/Prof Ross has also directed the Field Acquired Information Management Systems (FAIMS) project (http://www.faims.edu.au/) aimed at developing data capture, management, and archiving resources for researchers in fieldwork-based disciplines. Previously, A/Prof Ross worked at the University of New South Wales (Sydney, Australia) and William Paterson University (Wayne, New Jersey).

12

About the conference

eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.

© 2017 - 2018 Conference Design Pty Ltd