Generalised Platforms for Small Data Research: Lessons from Six Years of FAIMS Mobile

Brian Ballsun-Stanton1, Shawn A Ross2, Adela Sobotkova3

1Macquarie University, Sydney, Australia, brian.ballsun-stanton@mq.edu.au

2Macquarie University, Sydney, Australia, shawn.ross@mq.edu.au

3Macquarie University, Sydney, Australia, adela.sobotkova@mq.edu.au 

Data collection is fundamental to field research from archaeology to environmental science. Scientists, engineers, and technicians are turning to mobile devices such as tablets and smartphones to capture data in the field. Citizen science and community heritage projects allow crowdsourcing of data collection beyond what traditional research teams can accomplish, while educating and involving the public in the scientific endeavour. While some software is available to support general data collection, none provides the functionality or flexibility necessary for environmental and cultural research and monitoring in diverse circumstances. Neither does existing software support the development and sharing of new features and functions to foster communities of practice.

The lack of such software hinders field research. In a recent edition of the journal Science, McNutt et al. argue that ‘field sciences’ like archaeology, geology, and ecology lack transparency and reproducibility, compromising research results [1]. Too often, data sharing in field research amounts to merely ‘data and samples available upon request’ [1], while analytical processes, like the code used to process datasets, are not available at all [2]. Much human-mediated field research suffers from ‘small science’ data problems: diverse and idiosyncratic data, customised methodologies and recording systems, lack of standards, and limited budgets, which together restrict the availability of high-quality, compatible data [3,4]. This problem is exacerbated by new data-intensive field research methods, like geophysics and photogrammetry, that have increased the quantity of data being collected by researchers. The culture of field research has often preferred one-off solutions built for individual projects or organisations, a situation that leads to duplication, under-resourced development efforts, sustainability problems, and unfamiliarity with good practice. It also ‘prioritizes publications, innovation, and insight, which puts data stewardship and reuse far down the list’ [1]. Researchers tend to make do with mass-market software not designed with research in mind and unresponsive to community needs, requiring them to compromise their approaches [5]. As a consequence, field researchers often organise information idiosyncratically, using an ad-hoc mix of hard copy, data fragments in various formats, and bespoke databases [3,4,6,7]. Data then gets trapped in hard-copy archives, local storage, or digital ‘silos’, all of which make data difficult to discover and reuse [8]. Where digital datasets exist, they are often highly variable, of poor quality, and incompatible. These deficiencies not only waste time and effort and slow the publication of field research, but also inhibit reproduction or verification of results, independent analyses of primary data, the application of new techniques to old datasets, and the combination of datasets from multiple studies for large-scale research to address ‘grand challenges’ in field-based disciplines[6,9,1]. Furthermore, in many cases they do not meet international good practice (e.g., FAIR Data Principles, [10]) or the data management and dissemination expectations of funding (e.g., the US National Science Foundation or the ARC itself [11]).

Field researchers have long struggled with the digitisation bottleneck; online data services have long existed, but they remain under-populated because getting findable, accessible, interoperable, and reusable (FAIR) digital data into them is costly and time-consuming. Stocktaking by the US Geological Survey and Bureau of Reclamation indicate that no existing software meet field researchers’ needs [12], while reliable bespoke software is difficult and expensive to create and maintain.

Thus, small-data disciplines in the sciences, social sciences, and humanities are characterised by limited resources, diverse practice, and heterogenous data. Information infrastructure often emerges from – and during – research [4]. Mass-market database software requires time-consuming customisation and fails to meet many research needs, while bespoke software development is costly and unsustainable. What is the solution?

The authors have six years’ experience developing and deploying FAIMS Mobile, a platform that allows researchers to generate custom field data collection software with tailored interfaces, data structures, automation, and other features [13].

Based on this background, we argue that researchers in small-data disciplines deserve fit-for-purpose, research-specific applications, and we discuss the key features of sustainable small-data software, including:

  • ‘Generalised’ architectures, in which the ‘core’ software meets research-specific needs, can be used across many disciplines (facilitating a larger user community and spreading costs), and also allows profound customisation for the diverse data and workflows in our disciplines at a lower cost than bespoke software development.
  • Modular, ‘loosely coupled’ approaches, where independent applications work together, with the expectation that while individual components come and go do to the vagaries of funding and software development, researchers will never be left stranded.
  • Open-source licensing, allowing limited resources from multiple organisations to be pooled, and for software to be passed from one project or organisation to the next, with minimal friction.

We will briefly present a few examples of such software, then discuss how FAIMS Mobile implemented these principles, how it worked in the field, and how we would approach such software today based on lessons learned.

REFERENCES

  1. McNutt M, et al. Liberating field science samples and data. Science. 2016 Mar  4;351(6277):1024–6.
  2. Marwick B. Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation. J Archaeol Method Theory. 2017 June 1;24(2):424–50.
  3. Kansa EC, Bissell A. Web syndication approaches for sharing primary data in ‘small science’ domains. Data Science Journal. 2010;9:42–53.
  4. Borgman C.L.. Big data, little data, no data: scholarship in the networked world. MIT press; 2015.
  5. Sobotkova A, et al. Measure Twice, Cut Once: Cooperative Deployment of a Generalized, Archaeology-Specific Field Data Collection System. In: Averett EW, Gordon JM, Counts DB, (eds.) Mobilizing the Past for a Digital Future: The Potential of Digital Archaeology. The Digital Press @ University of North Dakota; 2016. p. 337–72.
  6. Kintigh K. The Promise and Challenge of Archaeological Data Integration. Am Antiq. 2006;71(3):567–78.
  7. Snow DR, et al. Cybertools and archaeology. Science. 2006;311(5763):958–9.
  8. Blanke T, Hedges M. A Data Research Infrastructure for the Arts and Humanities. In: Managed Grids and Cloud Systems in the Asia-Pacific Research Community. Springer, Boston, MA; 2010. p. 179–91.
  9. Kintigh K, et al. Grand challenges for archaeology. Proc Natl Acad Sci U S A. 2014 Jan 21;111(3):879–80.
  10. Wilkinson MD, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. 2016 Mar 15;3:160018.
  11. NSF. Dissemination and Sharing of Research Results [Internet]. Office of Budget Finance & Award Management. 2004 [cited 2017 Mar 29]. Available from: https://www.nsf.gov/bfa/dias/policy/dmp.jsp
  12. DataApp | Research and Development Office [Internet]. Bureau of Reclamation. 2017 [accessed 2018 Mar 27]. Available from: https://www.usbr.gov/research/challenges/dataapp.html
  13. Ballsun-Stanton B, et al. FAIMS Mobile: Flexible, open-source software for field research. SoftwareX 2018

Biography:

Brian Ballsun-Stanton: https://orcid.org/0000-0003-4932-7912

Shawn Ross:  https://orcid.org/0000-0002-6492-9025

Adela Sobotkova: https://orcid.org/0000-0002-4541-3963

Adela Sobotkova is a Research Fellow at Macquarie University, Sydney. Her research combines archaeology and digital methods to study the long-term history of the Balkans and Black Sea region, with emphasis on the evolution of social complexity.

Dr Sobotkova is a landscape archaeologist who studies past settlement patterns in their environmental context, with special focus on the rise and decline of social complexity and human-environment interactions. Much of her research  involves aggregation of datasets for large-scale synthetic studies. Dr Sobotkova is an advocate of reproducible workflows and deep digital practice in archaeology; her forte is open-source mobile field recording, data management, and regional remote sensing for cultural heritage monitoring.

About the conference

eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.

© 2017 - 2018 Conference Design Pty Ltd