Dimensions the next generation approach to data discovery

Ms Anne Harvey1

1Digital Science, Carnegie, Australia

 

The research landscape exists in silos, often split by proprietary tools and databases that do not meet the needs of the institutions they were developed for. What if we could change that? In this session we’ll showcase Dimensions: a platform developed by Digital Science in collaboration with over 100 research organizations around the world to provide a more complete view of research from idea to impact.

We’ll discuss how the data now available enables institutions to more easily gather the insights they need to inform the most effective development of their organization’s activities, and look at how linking different sections of the scholarly ecosystem (including grants, publications, patents and data) can deliver powerful results that can then be integrated into existing systems and workflows through the use of APIs and other applications.

In particular, we’ll explore how the Dimensions approach to re-imagining discovery and access to research will transform the scholarly landscape, and the opportunities it presents for the research community.


Biography:

Anne Harvey is the Managing Director for Digital Science Asia Pacific with an overall responsibility of supporting clients with their research management objectives.

Anne has been involved in a number of projects including Big Data Computing (which refers to the ability of an organisation to create, manipulate, manage and analyze large data sets and its ability to drive knowledge creation), Australia’s ERA 2012 and 2010 (research assessment exercise).

Anne has a passion for information and research and previous positions include Regional Sales Manager at Elsevier, Business Development Manager at Thomson Reuters.

Singularity.. Not as scary as you think it is!

Mr Jafaruddin Lie1

1Monash University, Clayton, Australia, jafar.lie@monash.edu

 

Singularity [1] is a container solution that is developed specifically with high performance computing in mind. Singularity makes it possible to install programs that will not run on the host operating system by creating a container of the operating system that will run the application. An example on how we use it in MASSIVE M3 is the installation of the package Caffe [2], the Python-based deep learning framework. MASSIVE M3 primarily runs on CentOS 7 operating system.

Most of the library dependencies needed by Caffe are not available on CentOS because the libraries provided by the operating system are behind the latest. It is possible, of course, to manually compile and install all these dependencies as modules and compile Caffe this way, however, this is very time consuming. The manual process took us one working week to find, configure, compile, and test properly while the Singularity build takes us one day. We did not find any significant performance difference between the natively compiled Caffe and the one installed in Singularity container.

Another advantage of Singularity is that the containers are reproducible. The definition files used to build the containers can be shared, which will make building applications faster and standardized across research teams. Overall, we find that using Singularity to install application enables us to deliver the applications to our users faster, enables us to reliably reproduce and upgrade the applications in the containers, and makes sure that we never have to compile any application ever again. It keeps us sane.


Biography:

Jafar spends 14 years working in the field of system administration and network security before moving to Monash. He is now happily working with the people in Monash eResearch HPC team, occasionally referencing Pokemon Go in the conversation around the office.

Workspace for Industry 4.0

Dr Damien Watkins1Lachlan Hetherton1, Nerolie Oakes, David Thomas, Damien Watkins, Nirupama Sankaranarayanan

1CSIRO, Clayton, Australia

 

Introduction

Industry 4.0 (a.k.a. “fourth industrial revolution”) refers to the amalgamation of automation & robotics, Internet of Things (IoT), network communications, cloud/cluster computing, artificial intelligence and human-computer interaction in manufacturing systems. Many of the benefits and challenges in achieving Industry 4.0 are common to the “digital transformation” of other domains and industries. Scientific Workflow Systems (SWSs) have been used both within and between many scientific domains to facilitate such digital transformations. Currently CSIRO is using its own SWS, Workspace, to prototype elements of an Industry 4.0 architecture and test the usefulness of using an SWS as the backbone of an Industry 4.0 platform. In this paper we introduce the concepts of Industry 4.0, SWSs and Workspace, and describe the suitability of the Workspace framework for creating applications and workflows compatible with Industry 4.0 requirements.

Industry 4.0

Industry 4.0 (CSIRO Manufacturing (2018), German Trade and Invest (2018), Wikipedia 2018) is driven by a number of factors including: increasing computational power and data volumes, advances in connectivity, advancing analytics and machine/business-intelligence, new forms of human-machine interaction and improvements in transferring digital instructions to the physical world (Department of Industry, Innovation and Science 2018). Industry 4.0 systems often create a “Digital Twin” of a real manufacturing environment to support monitoring and facilitate decision making – either by the systems themselves or by human operators. These manufacturing environments are not limited to a single physical site: indeed, they may span many traditional borders such as companies, countries, computer systems and technology domains. This means that they are inherently heterogeneous and complex. Platforms that simplify the development and/or deployment of such systems will add to their attractiveness to industry.

workspace

A SWS is a workflow system which allows the composition and execution of a sequence of computational steps in a scientific application. Many SWSs support distributed development by multiple scientists and as such they normally provide support for: storing the workflow description in a generic format (e.g. XML), execution on multiple operating systems (e.g. Linux, MAC, Windows), the use of multiple programming languages, interfaces for calling different execution environments (e.g. interactive, batch,  PBS, SLURM, AWS, etc.), visualisation capabilities and so forth. The depth of any such support varies greatly between SWS (Deelman 2009, Gil 2007).

Workspace (Cleary et al. 2014, Cleary et al. 2015, Cleary et al. 2017, Workspace, 2014, Watkins 2017) is a SWS that supports the creation of scientific workflows and applications for commercial and research purposes.  Under development at CSIRO since 2005, Workspace has been used in a number of different scientific domains. The Workspace framework provides a single, cross-platform environment to develop and execute scientific software tools and libraries that can be easily accessed by a wide range of users. Examples of Workspace-based workflows and applications in different scientific domains would include: ArcWeld (Murphy and Thomas, 2014, 2018), Amicus (Sullivan et al. 2013), Dive Mechanic (Cohen et al. 2018), HelioSim (Potter 2018) and Spark (Miller et al. 2015). Many of these applications support the development of a Digital Twin of a real world system augmented with CSIRO’s advanced modelling capabilities.

Workspace and Industry 4.0

As can be seen from the descriptions above, an SWS attempts to address many of the challenges of an Industry 4.0 system, providing support for interoperability, digital twining, visualization, cluster/cloud execution and so on. A key feature of Workspace and some other SWSs is the capability of users to extend the inbuilt functionality via an extensible plugin architecture. This flexibility allows individuals and teams to easily add their own data types, algorithms and GUI components into the framework to use and share with others. In the case of Workspace the plugin architecture has been used to expose a number of popular scientific libraries, (such as OpenCV (Open Source Computer Vision Library), PCL (Point Cloud Library) and VTK (Visualization Toolkit) and languages (such as Python, R and MATLAB). One key advantage of Workspace is that it facilitates the creation of standalone applications with custom GUIs that hide the underlying workflows. This makes complex scientific software easy to use in an industrial setting – end users on the factory floor often require a simple GUI application that only exposes access to information and controls necessary for their task at hand.  ArcWeld, Dive Mechanic, and SPARK are examples of rich underlying workflows that have been packaged into user intuitive applications for use in situ by the end user.

Industry 4.0 demonstration facility

An Industry 4.0 demonstration lab is currently under construction at CSIRO Clayton. The system uses a number of devices including DSLR Cameras (Nikon), Time of Flight (ToF) Cameras (ODOS Swift), Projectors (Casio), Kinects (Microsoft) and other devices. The system makes use of a number of computational modelling capabilities developed by the Computational Modelling and Simulation Group of CSIRO. The system has a number of integrated libraries such as CSIRO’s Stereo Depth Fusion functionality. Numerous Workspace-based application/workflows are being developed for tasks such as: individual device control, communications, and visualisation. Mixed Reality output devices (i.e. Hololens, Meta 2) will soon be integrated into the system.

 Figure 1: Logical view of the CSIRO Clayton Mixed Reality Laboratory

Conclusion

Industry 4.0 and SWSs offer similar benefits while addressing similar challenges. Evaluating the suitability of using an SWS such as Workspace in an Industry 4.0 environment using a purpose-built testbed should provide valuable insights into the applicability of the approach. Currently, although this is a work in progress, initial results have been positive and we expect to have more insights to share as the deployment of our testbed continues.

REFERENCES

Cleary, P., Bolger, B., Hetherton, L., Rucinski, C., Thomas, D., Watkins, D. (2014), Workspace: A Platform for Delivering Scientific Applications”, Proc. eResearch 2014, Melbourne, Australia, 27-31 October.

Cleary, P.W., Thomas, D., Bolger, M., Hetherton, L., Rucinski, C., and Watkins, D., (2015), Using Workspace to automate workflow processes for modelling and simulation in engineering, MODSIM 2015, Gold Coast, Australia, December 2015.

Cleary, P. W., Watkins, D., Hetherton, L., Bolger, M. and Thomas, D., (2017), Opportunities for workflow tools to improve translation of research into impact, 22nd International Congress on Modelling and Simulation (MODSIM 2017), Hobart, Tasmania, Australia, 3-8th December 2017.

Cohen, R. C. Z., Harrison, S. M., and Cleary P. W., (2018), Dive Mechanic: Bringing 3D virtual experimentation to elite level diving using the Workspace workflow engine, submitted to special issue: Mathematics and Computers in Simulation.

CSIRO Manufacturing (2018), Advanced Manufacturing Roadmap https://www.csiro.au/en/Do-business/Futures/Reports/Advanced-manufacturing-roadmap

Deelman, E., Gannon, D., Shields, M., and Taylor, I., (2009), Workflows and e-Science: An Overview of Workflow System Features and Capabilities, Future Generation Computer Systems, May, 2009, Volume 25, Number 5, ISSN 0167-739X, Pages 528—540,  URL, http://dx.doi.org/10.1016/j.future.2008.06.012, DOI 10.1016/j.future.2008.06.012.

Department of Industry, Innovation and Science (2018), Industry 4.0 URL https://industry.gov.au/industry/Industry-4-0/Pages/default.aspx.

German Trade and Invest (2018) INDUSTRIE 4.0 https://www.gtai.de/GTAI/Navigation/EN/Invest/Industries/Industrie-4-0/Industrie-4-0/industrie-4-0-what-is-it.html

Gil, Y., Deelman E., Ellisman, M., Fahringer, T., Fox, G., Gannon, D., Goble, C., Livny, M., Moreau, L., and Myers, J., (2007), Examining the Challenges of Scientific Workflows, IEEE Computer, vol. 40, no. 12, pp. 24-32, December, 2007.

Miller, C., Hilton J., Sullivan A. and Prakash M., (2015), SPARK–A Bushfire Spread Prediction Tool, R. Denzer et al. (Eds.), Environmental Software Systems. Infrastructures, Services and Applications, 448, 262–271.

Murphy, T., Thomas, D., (2014), A user-friendly predictive model of arc welding of aluminium, Proc. 4th IIW Welding Research & Collaboration Colloquium, Wollongong, Australia, 5-6 November 2014, pp. 47.

Murphy. A. B., and Thomas, D. G., (2018), A computational model of arc welding – from a research tool to a software product, submitted to special issue: Mathematics and Computers in Simulation.

Potter, D. F., Khassapov, A., Pascual, R., Hetherton, L., and Zhang, Z., (2018), Heliosim: A Workspace-driven application for the optimisation of solar thermal power plants, submitted to special issue: Mathematics and Computers in Simulation.

Sullivan, A., Gould, J., Cruz, M., Rucinski, C., and Prakash, M., (2013), Amicus: A national fire behaviour knowledge base for enhanced information management and better decision making, 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1–6 December 2013.

Watkins, D., Thomas, D., Hetherton, L., Bolger, M. and Cleary, P.W., (2017), Workspace – a Scientific Workflow System for enabling Research Impact, 22nd International Congress on Modelling and Simulation (MODSIM 2017), Hobart, Tasmania, Australia, 3-8th December 2017.

Wikipedia 2018, Industry 4.0: https://en.wikipedia.org/wiki/Industry_4.0.


Biography:

Dr Damien Watkins is the Research Team Lead for the Computational Software Engineering and Visualisation team at Data61/CSIRO. His team is responsible for the development of Workspace, a scientific workflow platform used on projects across CSIRO and a number of Workspace-based applications.  Workspace has been available for external usage since 2014.

Rapid solution prototyping with open data and Jupyter notebook

Ms Kerri Wait1

1Monash University, Clayton, Australia, kerri.wait@monash.edu 

 

Open data initiatives have the potential to accelerate research activities, but with the sheer number of data formats, tools, and platforms available, it can be difficult to know where to begin and which approach to undertake. In this talk I’ll consider a hyperthetical[1] research project to acquire data on the quality of lamingtons in each Victorian local government area. I’ll show how python scripting inside a Jupyter notebook can retrieve and combine open data such as council boundaries and office locations to produce an optimised research path (i.e. where to drive to minimise distance and maximise lamington research benefits), and how much faster this approach is than manually wrangling data in spreadsheets and text files.

[1] Exaggeratedexcessivehyperbolical.


Biography:

Kerri Wait is an HPC Consultant at Monash University. As an engineer, Kerri has a keen interest in pulling things apart and reassembling them in novel ways. She applies the same principles to her work in eResearch, and is passionate about making scientific research faster, more robust, and repeatable by upskilling user communities and removing entry barriers. Kerri currently works with the neuroscience and bioinformatics communities.

Machine learning for the rest of us

Dr Chris Hines1

1Monash Eresearch Centre, Clayton, Australia

 

Neural Networks are the new hawtness in machine learning and more generally in any field that  relies heavily on computers and automation. Many people feel its promise is overhyped, but  there is no denying that the automated image processing available is astounding compared to  ten years ago. While the premise of machine learning is simple, obtaining a large enough  labeled dataset, creating a network and waiting for it to converge before you see a modicum of  progress is beyond most of us. In this talk I consider a hypothetical automated kiosk called  “Beerbot”. Beerbot’s premise is stated simply: keep a database of how many beers each person  has taken from the beer fridge. I show how existing open source published networks can be  chained together to create a “good enough” solution for a real world situation with little data  collection or labeling required by the developer and no more skill than a bit of basic python. I  then consider a number of research areas where further automation could significantly improve  “time to science” and encourage all eResearch practitioners to have a go.


Biography:

Chris has been kicking around the eResearch sector for over a decade. He has a background in quantum physics and with the arrogance of physicists everywhere things this qualifies him to stick his big nose into topics he knows nothing about.

Bending the Rules of Reality for Improved Collaboration and Faster Data Access

David Hiatt

WekaIO, San Jose, CA, United States, Dave@Weka.IO

 

The popular belief is that research data is heavy, therefore, data locality is an important factor in designing the appropriate data storage system to support research workloads. The solution is often to locate data near compute and depend on a local file system or block storage for performance. This tactic results in a compromise that severely limits the ability to scale these systems with data growth or provide shared access to data.

Advances in technology such as NVMe flash, virtualization, distributed parallel file systems, and low latency networks leverage parallelism to bend the rules of reality and provide faster than local file system performance with cloud scalability. The impact on research is to greatly simplify and reduce the cost of HPC class storage, meaning researchers spend less time waiting on results and more of their grant money goes to research rather than specialty hardware.


Biography:

David Hiatt is the Director of Strategic Market Development at WekaIO, where he is responsible for developing business opportunities within the research and high-performance computing communities. Previously, Mr. Hiatt led market development activities in healthcare and life sciences at HGST’s Cloud Storage Business Unit and Violin Memory. He has been a featured speaker on data storage related topics at numerous industry events. Mr. Hiatt earned an MBA from the Booth School of Management at the University of Chicago and a BSBA from the University of Central Florida.

Requirements On a Group Registry Service in Support of Research Activity Identifiers (RAiDs)

Dr Scott Koranda1, Dr Andrew Janke2, Ms Heather Flanagan1, Ms Siobhann Mccafferty3, Mr Benjamin Oshrin1, Mr Terry Smith3

1Spherical Cow Group, Wauwatosa, United States, skoranda@sphericalcowgroup.com,  hlflanagan@sphericalcowgroup.combenno@sphericalcowgroup.com

2Research Data Services, Brisbane, Australia, andrew.janke@uq.edu.au

3Australian Access Federation, Brisbane, Australia, siobhann.mccafferty@aaf.edu.aut.smith@aaf.edu.au

 

Persistent Identifiers (PID’s) are an essential tool of digital research data management and the evolving data management ecosystem. They allow for a clear line of sight along data management processes and workflows, more efficient collaboration and more precise measures of cooperation, impact, value and outputs. The Research Activity Identifier (RAiD) [1] was developed by The Australian Data Life Cycle Framework Project (DLCF) [2] in response to this need and is a freely available service and API. A RAiD is a handle (string of numbers) that is minted via the RAiD API. The handle is persistent and can have other digital identifiers associated with it such as ORCiDs [3], DOI’s [4], and Group Identifiers (GiDs).

The minting, structure, management, and curation of GiDs are open and evolving issues. We present the program of work and results from a study of these issues around GiDs undertaken by collaboration between the Research Data Services (RDS) [5] project and the Australian Access Federation (AAF) [6]. The study focused on supporting the group management needs of Australian research collaborations, services, and infrastructure and included use cases and user stories from the National Imaging Facility (NIF) [7], AARNET CloudStor [8], the UQ Data Management Planning system (UQ RDM) [9], and the Research Data Box (ReDBox) [10] from the Queensland Cyber Infrastructure Foundation (QCIF).

We report on requirements for a group registry service to serve as the foundation for a GiD API and detail what future enhancements to the group registry service will be necessary to support collaboration across international boundaries via services federated with eduGAIN through AAF subscriptions.

REFERENCES

  1. Available at: https://www.raid.org.au/, accessed 06 June 2018.
  2. Data Life Cycle Framework Project. Available at: https://www.dlc.edu.au/, accessed 06 June 2018.
  3. Available at: https://orcid.org/, accessed 06 June 2018.
  4. Available at: https://www.doi.org/, accessed 06 June 2018.
  5. Available at: https://www.rds.edu.au/, accessed 06 June 2018.
  6. Available at: https://aaf.edu.au/, accessed 06 June 2018.
  7. Available at: http://anif.org.au/, accessed 06 June 2018.
  8. Available at: https://www.aarnet.edu.au/network-and-services/cloud-services-applications/cloudstor, accessed 06 June 2018.
  9. Available at https://research.uq.edu.au/project/research-data-manager-uqrdm, accessed 06 June 2018.
  10. Available at https://www.qcif.edu.au/services/redbox, accessed 06 June 2018.

Biographies:

Andrew Janke is the Informatics Fellow for the  National Imaging Facility (NIF), Systems Architect, DLCF,  Research Data Services (RDS) and Senior Research Fellow for the Centre for Advanced Imaging (CAI) University of Queensland. https://orcid.org/0000-0003-0547-5171

Scott Koranda specializes on identity management architectures that streamline and enhance collaboration for research organizations.  https://orcid.org/0000-0003-4478-9026

Siobhann McCafferty is a Brisbane based Research Data Management specialist. She is the Project Manager for the Data LifeCycle Project (https://www.dlc.edu.au/) and part of the RAiD Research Activity Identifier Project (https://www.raid.org.au/).  https://orcid.org/0000-0002-2491-0995

Terry Smith is responsible for providing support and training activities to the AAF subscriber community and international engagement across the Asia Pacific region as chair of the APAN Identity and Access management working groups and globally through eduGAIN. https://orcid.org/0000-0001-5971-4735

Towards ‘end-to-end’ research data management support

Mrs Cassandra Sims1

1Elsevier, Chatswood, Australia, c.sims@elsevier.com 

 

Information systems supporting science have come a long way and include solutions that address many research data management needs faced by researchers, as well as their institutions. Yet, due to a fragmented landscape and even with the best solutions available, researchers and institutions are sometimes missing crucial insights and spending too much time searching, combining and analysing research data [1].

Having this in mind, we are working on holistically addressing all aspects of the research life cycle as it is shown in Figure 1. The research lifecycle starts from the design phase when researchers decide on a new project to work on next, prepare their experiments and collect initial data. Then it moves into the execution mode when research experiments are being executed. Research data collected, shared within the research group, processed, analysed and enriched. And finally research results get published and main research outcomes shared within the scientific community networks.

Figure 1: Research lifecycle

Throughout this process researchers use a variety of tools, both within the lab as well as to share their results. Research processes like this happen every day. However, there are no current solutions that enable end-to-end support of this process for researchers and institutions.

Many institutes have established internal repositories, which have their own limitations. At the same time, various open data repositories [2] have grown with their own set of data and storage/retrieval options, and many scholarly publishers now offer services to deposit and reference research datasets in conjunction with the article publication.

One challenge often faced by research institutes is developing and implementing solutions to ensure that researchers can find each other’s research in the various data silos in the ecosystem (i.e. assigning appropriate ontologies, metadata, researcher associations). Another challenge is to increase research impact and collaboration both inside and outside their institution to improve quantity and quality of their research output.

Making data available online can enhance the discovery and impact of research. The ability to reference details, such as ownership and content, about research data could assist in improved citation statistics for published research [3]. In addition, many funders increasingly require that data from supported projects is placed in an online repository. So research institutes need to ensure that their researchers comply with these requirements.

This talk will be about a suite of tools and services developed to assist researchers and institutions in their research data management needs [4], covering the entire spectrum which starts with data capture and ends with making data comprehensible and trusted enabling researchers to get a proper recognition and institutions to improve their overall ranking by going “beyond the mandates”.

I will explain how it integrates through open application programming interfaces with the global ecosystem for research data management (shown in Figure 2), including:

  1. DANS [7] for long-term data preservation,
  • DataCite [5] for DOIs and indexed metadata to help with data publication and inventory,
  • Scholix [6] for support of links between published articles and datasets,
  • More than 30 open data repositories for data discovery.

Figure 2: Integration with the global research data management ecosystem

The talk will conclude with the overview of the current data sharing practices and a short demonstration of how we incorporate feedback from our development partners: University of Manchester, Rensselaer Polytechnic Institute, Monash University and Nanyang Technological University.

REFERENCES

  1. de Waard, A., Cousijn, H., and Aalbersberg IJ. J., 10 aspects of highly effective research data. Elsevier Connect. Available from https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data, accessed 15 June 2018.
  2. Registry of research data repositories. Available from: https://www.re3data.org/, accessed 15 June 2018.
  3. Vines, T.H. et al., The Availability of Research Data Declines Rapidly with Article Age. Current Biology, 2014, 24(1): p. 94-97.
  4. Elsevier research data management tools and services. Available from: https://www.elsevier.com/solutions/mendeley-data-platform, accessed 15 June 2018.
  5. DataCite. Available from: https://www.datacite.org/, accessed 15 June 2018.
  6. Scholix: a framework for scholarly link exchange. Available from http://www.scholix.org/, accessed 15 June 2018.
  7. Data Archiving and Networked Service (DANS). Available from: https://dans.knaw.nl/en, accessed 15 June 2018.

Biography:

Senior Research Solutions Manager ANZ

Cassandra has worked for Elsevier for over 6 years, as Product Solutions Manager APAC and currently as Senior Research Solutions Manager ANZ. Cassandra has demonstrated experience and engagement in both the Academic, Government and Health Science segments in region, working with Universities, Government Organisations, Local Area Health Districts, Funders and Industry, to assist in the development of business strategies, data asset management and core enterprise objectives. Specialising in detailed Analytics, Collaboration Mapping and Bibliometric Data, Cassandra builds on her wealth of knowledge in these areas to assist our customer base with innovative and superior solutions to meet their ever changing needs. Cassandra has worked with the NHMRC, ARC, MBIE, RSNZ, AAMRI and every university in the ANZ region. Cassandra is responsible for all new business initiatives in ANZ and in supporting strategic initiatives across APAC.

CILogon 2.0: An Integrated Identity and Access Management Platform for Science

Dr Jim Basney2, Ms Heather Flanagan1, Mr Terry Fleury2, Dr Scott Koranda1, Dr Jeff Gaynor2, Mr Benjamin Oshrin1

1Spherical Cow Group, Wauwatosa, United States, hlflanagan@sphericalcowgroup.comskoranda@sphericalcowgroup.combenno@sphericalcowgroup.com 

2University of Illinois, Urbana, United States, jbasney@illinois.edutfleury@illinois.edugaynor@illinois.edu,  

 

When scientists work together, they use web sites and other software to share their ideas and data. To ensure the integrity of their work, these systems require the scientists to log in and verify that they are part of the team working on a particular science problem.  Too often, the identity and access verification process is a stumbling block for the scientists. Scientific research projects are forced to invest time and effort into developing and supporting Identity and Access Management (IAM) services, distracting them from the core goals of their research collaboration.

CILogon 2.0 provides a software platform that enables scientists to work together to meet their IAM needs more effectively so they can allocate more time and effort to their core mission of scientific research. The platform builds on prior work from the CILogon [1] and COmanage [2] projects to provide an integrated IAM platform for cyberinfrastructure, federated worldwide via InCommon [3] and eduGAIN [4]. CILogon 2.0 serves the unique needs of research collaborations, namely the need to dynamically form collaboration groups across organizations and countries, sharing access to data, instruments, compute clusters, and other resources to enable scientific discovery.

We operate CILogon 2.0 via a software-as-a-service model to ease integration with cyberinfrastructure, while making all software components publicly available under open source licenses to enable reuse. We present the design and implementation of CILogon 2.0, along with operational performance results from our experience supporting over four thousand active users.

REFERENCES

  1. Available at: http://www.cilogon.org/, accessed 07 June 2018.
  2. Available at: https://spaces.internet2.edu/display/COmanage/Home, accessed 07 June 2018.
  3. Available at: https://www.incommon.org/, accessed 07 June 2018.
  4. Available at: https://edugain.org/, accessed 07 June 2018.

Biography:

Scott Koranda specializes on identity management architectures that streamline and enhance collaboration for research organizations.  https://orcid.org/0000-0003-4478-9026

Managing Your Data Explosion

Michael Cocks1

1Country Manager – ANZ, Spectra Logic, mikec@spectralogic.com

 

As high performance computing (HPC) environments, universities, and research organizations continually tests the limits of technology and require peak performance from their equipment, the volume of data created each day will continue to grow exponentially over time. It is essential for these organizations to consider future needs when examining storage options. Short-term fixes to store and manage data are appealing due to their low entry-point, but often worsen long-term storage challenges associated with performance, scalability, cost, and floor space. A future-looking data storage solution for HPC requires:

  1. A multi-tier architecture to disk, tape, and cloud
  • Fully integrated clients that are easy to use and support the seamless transfer, sharing and publication of very large data sets from online, nearline and offline storage across diverse sites and systems
  • The capability to plan for growth, scale incrementally, and span the entire data lifecycle

This presentation will go over the advantages of a fully integrated multi-tier HPC data storage architecture and how these types of solutions help organizations dealing with massive storage management push the boundaries of their operational objectives, providing cost-effective storage that meets all of their performance, growth, and environmental needs.

Figure 1: A multi-tier hybrid storage ecosystem


Biography:

Michael Cocks is the Country Sales Manager for Spectra Logic in Australia and New Zealand. With more than 25 years of experience in the industry, Michael has held various roles within computing and data storage companies such as Silicon Graphics (SGI), Hitachi Data Systems (HDS), Olivetti and Spectra Logic. At Spectra, he manages relations with several customers in the Australia and NZ area, including Fox Sports, Foxtel, Weta Digital, Post Op Group, TVNZ, Australian Square Kilometre Array Pathfinder, Australian National University, UTAS, CSIRO and many others. Michael graduated from Southampton University in the UK where he studied Electronics Engineering.

Recent Comments

    About the conference

    eResearch Australasia provides opportunities for delegates to engage, connect, and share their ideas and exemplars concerning new information centric research capabilities, and how information and communication technologies help researchers to collaborate, collect, manage, share, process, analyse, store, find, understand and re-use information.

    Conference Managers

    Please contact the team at Conference Design with any questions regarding the conference.

    © 2018 - 2019 Conference Design Pty Ltd