Dr Chao Sun1
1The University Of Sydney, Camperdown, Australia, firstname.lastname@example.org
As the data scientist in the faculty of arts and social sciences, one of the most commonly received request is visualisation. In the area of digital humanities, the visualisations are mostly used for showing the networks, presenting the findings and analysing numeric records. A number of popular platforms/software are well designed and widely used for various purposes, such as Gephi is often used for exploring networks, NVivo is a good platform for analyse and visualise textual data, and Tableau is the trending business analytic solution for generating reports etc.
The existence of these tools makes things much easier for replicating and standardise some works. However, the nature of research is never limited to any pre-designed functions, and many social sciences researches raise unique digital requests for assisting the qualitative studies.
This presentation includes two showcases where visualisations were developed more as a research tool than as a standard visual outcome. In either case, popular software (Gephi or Tableau) was employed but tweaked to present information differently.
SHOWCASE 1: TIMELINE OF WIKIPEDIA PAGES & EDITORS
The Wikipedia is a very rich resource of knowledges, however it also acts as a crowd media agency that gathers and records up-to-date information especially for emergency events, such as the event of 2014 Sydney hostage crisis (the Lindt Café siege).
One of our researchers was interested in how the relevant Wikipedia page was constructed, who has contributed to the editions, how the debates on this page happened and eventually settled. The good news is that the Wikipedia is fully open, and all revision details can be retrieved using API. However, during the past two and half years, there are over 3,000 revisions of this page and hundreds of editors who edited the page more or less. It is impossible to read through all the history and do a throughout qualitative study. The revision data can be gathered, cleansed and organised in a spreadsheet, however it is still not easy to find the right spot to drill in.
After a lot of communication, both the researcher and I figured out the best tool for approaching the research questions would be a timeline, on which the active editors, significant contributions and important time points can be visually identified. However, no known tool can be used for generating such a timeline on this specific problem.
Figure 1. Important Revision/Editor Timeline for the “2014_Sydney_hostage_crisis” Wikipedia Page.
As shown in Figure 1, Python functions were made to analyse the Wikipedia revision data and to generate a network graph with all coordinates, then used Gephi to draw the nodes and edges as a visually informative research tool for studying the problem. This timeline vis only shows significant editions made by the most active editors. The X-Axis is the time and the Y-Axis is the size of the article (word count). The node size represents how much change is made to the page, and nodes colour stands for the editor. Edits made by the same editor are of the same colour and are linked together, so it’s easy to see when an editor made contribution and revisited for more edits.
Because all nodes and edges are automatically generated, similar timelines can be quickly generated for various periods and selected editors. The interactive interface of Gephi makes the timeline an even more powerful tool with filtering, highlighting and customised information display capacity.
SHOWCASE 2: HIERACHY INSCRIPTION DISPLAY WITH TABLEAU
In another research project, the business analytic platform, Tableau, is used purely as an interactive interface for displaying Buddhist inscription with multiple hierarchy levels of meaning.
In this work, the Gāndhārī inscriptions have been closely studied, analysed and modelled using a specially developed workbench for ancient documents studies (READ). Texts, annotations and tags of the inscription are retrieved from an online database server and then processed using Python. Each grapheme in the inscription is displayed as tiles with various colours and shapes at different panels in the Tableau workbook, depending on the hierarchy and analysis relationships. The researcher could easily filter, highlight, click to display relevant levels and check the meta-data attached to each token, as shown in Figure 2.
Figure 2. Tableau Workbook for Displaying Hierarchy Levels of Buddhist Inscription as Interactive Tiles.
Disregarding the great number crunching capacity of the Tableau, we turned it into an interactive text displaying platform on which researchers could view and explore the conceptual relationships among texts. This has been proven to be a much more efficient and effective way of studying than exploring the tables stored in a database.
The digital humanities and social sciences is still a quite new area of research, and there are a lot of potentials as well as problems in this domain. With sufficient level of communication and understanding between the data scientist and the researcher, it is often necessary to design novel methodologies for approaching certain research questions using the digitalised tools.
However, often, it is not necessary to make everything from the scratches. With good problem solving skills, and some creativities, we are able to utilise the existing tools in a different way and achieve the goals smartly.
Chao Sun obtained his PhD degree from University of Wollongong on Data Mining, and joined Faculty of Arts and Social Sciences, the University of Sydney as a data scientist in 2016. Chao has been supporting and collaborating many digital humanities and social sciences research projects, providing services such as consulting, research methodology designing, data collecting and crunching, visualisation generating etc. Chao is also jointly working in the Sydney Informatics Hub in USyd and as the representative for TrISMA project (QUT). Chao is enthusiastically acting as a bridge for generating cross-discipline collaborations between the faculty and a broader network of data specialised researchers.