Semi-auto generated reports from a large dataset for non-expert users

Dr Rebecca Handcock1,2, Professor Cameron Neylon2, Dr Richard  Hosking1,2, Aniek Roelofs1,2, Dr James Diprose2, Associate Professor Lucy Montgomery2, Dr Alkim Ozaygen2, Dr Katie Wilson2, Dr Chun-Kai (Karl) Huang2

1Curtin Institute for Computation, Curtin University, Bentley, Australia
2Curtin Open Knowledge Initiative, Curtin University, Bentley, Australia

BACKGROUND

The Academic Observatory (AO) dataset contains more than 12 trillion pieces of information on university research, publications, and funding, collected by the Curtin Open Knowledge Initiative (COKI). This dataset is used by researchers and strategic decision makers to understand university performance.

AO data is stored in Google Cloud Platform, with data presentation typically via data dashboards. Many users require custom data extractions presented as traditional reports, yet may not have the technical expertise to extract the data.

METHOD

Our method of generating these reports from the AO dataset is inspired by literate programming concepts, being templated documents with code insertions. We use the “Precipy” python library, with specific report parameters contained within a configuration file, and analytics functions for data processing and visualisation specified in a customisable analytics module.

The python tools we developed for use with “Precipy” were designed for the domain context of the AO dataset. This includes managing data access, summaries of tabular data, custom plots, and semi-auto generated blocks of text expansions common in such reports. These tools are combined with CSS and Markdown templates to control the final design and layout in PDF and HTML formats.

RESULTS AND CONCLUSION

Our methodology facilitates generating multiple similar reports such as for data from different countries, or repeated report running such as monthly summaries. It addresses the need for generating reports from the large complex AO dataset for non-expert users.


Biography:

Rebecca Handcock is a Spatial Data Scientist with a PhD from the University of Toronto. Her research ranges from using remote sensing and sensor networks to monitor agriculture and water, to recent projects focusing on health, research evaluation and bibliometrics. Rebecca has previously spent 10 years as a research scientist at CSIRO, and has held roles within the academic sector including the University of Washington. She is part of Homeward Bound, a global initiative to foster leadership among women in STEMM fields.

About the conference

eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.

© 2018 - 2020 Conference Design Pty Ltd