Prof. Richard Sinnott1
1University Of Melbourne, Melbourne, Australia
Over 700 Masters-level students at the University of Melbourne have been taught big data analytics on the NeCTAR Research Cloud since 2013 as part of the Cluster and Cloud Computing course. This course covers HPC programming including MPI as well as the hands-on experiences in dynamic deployment and scaling of applications on the Cloud typically to support big data analytics. Students are exposed to technologies such as noSQL systems such as CouchDB, CouchBase, Hadoop/HDFS and Spark, as well as how to write scalable Cloud solutions using scripting approaches such as Boto and Ansible.
This talk will focus on one example of student implementation work that focuses on real time processing of social media data (Twitter, Instagram, Flickr and Foursquare) to better under the way in which individuals move around the city – capturing the so called pulse of the city. Such work provides insights into people’s daily routines that are otherwise impossible to capture. A key part of this work is in data analytics and data visualization. This includes algorithms for sentiment analysis and the scalability of these algorithms across the NeCTAR Research Cloud. We describe the technical solutions that have been adopted that reflect best practice in this space and the lessons learned in using the NeCTAR Research Cloud for such analysis. We also discuss the potential dangers of such data use and the privacy issues that they give rise to.
Professor Richard O. Sinnott is the Director of eResearch at the University of Melbourne and Chair of Applied Computing Systems. In these roles he is responsible for all aspects of eResearch (research-oriented IT development) at the University. He has been lead software engineer/architect on an extensive portfolio of national and international projects, with specific focus on those research domains requiring finer-grained access control (security). He is technical lead for the AURIN project and a range of other application domains. He has taught Cluster and Cloud Computing to over 400 Masters-level students at the University of Melbourne since 2013. He has supervised over 250 Masters dissertations in the last 4 years on a range of topics including big data analytics and Cloud computing, with extensive focus on social media data analytics and use of Twitter data. (Geoff and Chao are two such students!)