Mr Peter Green1, Dr Pauline Joseph1, Ms Amanda Bellenger1, Mr Aaron Kent1, Mr Matthew Robinson1
1Curtin University, Perth, Australia, P.Green@curtin.edu.au, P.Joseph@curtin.edu.au, A.Bellenger@curtin.edu.au, Aaron.J.Kent1@gmail.com, Matt.Robinson@curtin.edu.au
Curtin University Library manages authenticated access to its online journal, book and database collections using the URL re-writing proxy service called EZproxy. EZproxy mediates the request between user and publisher platform via the Library. The proxy service is widely deployed in libraries worldwide and has been a standard authentication solution for the industry for many years. The EZproxy software creates a log entry for each request in the Combined HTTP Log format. The log files are extensive, with approximately 30 million lines written per month. The log files capture information for each request such as the IP address, client ID, date and time, HTTP request and response and so forth. The Curtin Library has retained at least five years of the log files.
This large dataset presents an opportunity to learn more about the information seeking behaviour of Curtin Library clients, but also presents a challenge. Traditional analysis of such data tends to produce aggregated usage statistics that do not reveal activity at a granular level. Immersive visualisation could provide a means to see the data in a new way and reveal insights into the information seeking behaviour of Curtin Library clients. In collaboration with Dr Pauline Joseph, Senior Lecturer (School of Media, Creative Arts and Social Inquiry) the Curtin Library proposed this work for funding under the Curtin HIVE Research Internships program. The proposal was successful and a computer science student, Aaron Kent, was employed for a ten week period to produce visualisations from the EZproxy log file dataset.
The data was anonymised to protect client confidentiality whilst retaining granularity. The number of lines in the log file were reduced by removing ‘noise’. The Unity3D software was chosen for its ability to provide visualisations that could be displayed on the large screens of the HIVE but also desktop screens. Many possibilities were discussed for visualisations that might give insight into client behaviour, but two were chosen for the internship.
The first visualisation focusses on the behaviour of individual users in time and space and represents each information request using an inverted waterfall display on a global map as illustrated by Figure 1. Different sizes and shapes are used to present different client groups and the size of the information request is reflected in the size of the object. Geolocation information is used to anchor each request on the map.
Figure 1: Global user visualisation
The second visualisation focusses on the usage of particular resources over time and represents each information request as a building block in a 3D city as illustrated by Figure 2. The different client groups and the volume of requests are illustrated over time by location and size against each particular scholarly resource.
Figure 2: Scholarly resource visualisation
The successful visualisation prototypes have shown that the EZproxy log file data is a rich source of immersive visualisation and further development will yield tools that Curtin Library can use to better understand client information seeking behaviour.
- EZproxy Documentation. Available from: https://www.oclc.org/support/services/ezproxy/documentation/learn.en.html accessed 8 June 2018.
- Combined Log Format. Available from: http://fileformats.archiveteam.org/wiki/Combined_Log_Format accessed 8 June 2018.
- Curtin HIVE (Hub for Immersive Visualisation and eResearch). Available from: https://humanities.curtin.edu.au/research/centres-institutes-groups/hive/ accessed 8 June 2018.
- Unity3D. Available from: https://unity3d.com/ accessed 8 June 2018.
Peter Green is the Associate Director, Research, Collections, Systems and Infrastructure in the Curtin University Library, Australia. He is responsible for providing strategic direction, leadership and management of library services in support of research, the acquisition, management, discovery and access of scholarly information resources, and information technology, infrastructure and facilities.