Semi-auto generated reports from a large dataset for non-expert users

Dr Rebecca Handcock1,2, Professor Cameron Neylon2, Dr Richard  Hosking1,2, Aniek Roelofs1,2, Dr James Diprose2, Associate Professor Lucy Montgomery2, Dr Alkim Ozaygen2, Dr Katie Wilson2, Dr Chun-Kai (Karl) Huang2

1Curtin Institute for Computation, Curtin University, Bentley, Australia
2Curtin Open Knowledge Initiative, Curtin University, Bentley, Australia


The Academic Observatory (AO) dataset contains more than 12 trillion pieces of information on university research, publications, and funding, collected by the Curtin Open Knowledge Initiative (COKI). This dataset is used by researchers and strategic decision makers to understand university performance.

AO data is stored in Google Cloud Platform, with data presentation typically via data dashboards. Many users require custom data extractions presented as traditional reports, yet may not have the technical expertise to extract the data.


Our method of generating these reports from the AO dataset is inspired by literate programming concepts, being templated documents with code insertions. We use the “Precipy” python library, with specific report parameters contained within a configuration file, and analytics functions for data processing and visualisation specified in a customisable analytics module.

The python tools we developed for use with “Precipy” were designed for the domain context of the AO dataset. This includes managing data access, summaries of tabular data, custom plots, and semi-auto generated blocks of text expansions common in such reports. These tools are combined with CSS and Markdown templates to control the final design and layout in PDF and HTML formats.


Our methodology facilitates generating multiple similar reports such as for data from different countries, or repeated report running such as monthly summaries. It addresses the need for generating reports from the large complex AO dataset for non-expert users.


Rebecca Handcock is a Spatial Data Scientist with a PhD from the University of Toronto. Her research ranges from using remote sensing and sensor networks to monitor agriculture and water, to recent projects focusing on health, research evaluation and bibliometrics. Rebecca has previously spent 10 years as a research scientist at CSIRO, and has held roles within the academic sector including the University of Washington. She is part of Homeward Bound, a global initiative to foster leadership among women in STEMM fields.