ByteScience: A Cloud-Based Platform for Accelerating Scientific Research and Informatics

Mr. Shaozhou Wang1,2, Imran Razzak1, Chaokai Lai3, Yuwei Wan2,4, Prof. Wenjie Zhang1, Prof. Bram Hoex1, Mr. Tong Xie1,2

1UNSW Sydney, Australia, 2GreenDynamics Pty. Ltd., Sydney, Australia, 3Alibaba Group, China, 4City University of Hong Kong, Hong Kong, China

Biography:

Tong Xie is a PhD at the School of Photovoltaic and Renewable Energy Engineering (SPREE), UNSW Sydney, acclaimed as one of Australia’s National Computational Infrastructure’s Top 10 HPC AI-Talents. As the CEO of GreenDynamics and the Group Lead of UNSW AI4Science, he is pioneering the use of Generative AI to accelerate the discovery and development of sustainable materials. His expertise extends to Natural Language Processing and Material Science. He also founded the DARWIN natural science language model, demonstrating his innovative approach to advancing AI in material sciences.

Abstract:

Reading and extracting valuable material data and insights from the vast amount of literature has been a significant challenge for material scientists. The difficulties include: 1) the enormous volume of literature with hundreds of new papers published daily, which requires researchers to spend substantial time searching and filtering relevant documents; 2) most empirical data existing as unstructured text within scientific papers, which makes it challenging to accurately extract data using existing tools, especially machine-readable data for machine learning material predictions. To address these issues, we introduce ByteScience, a cloud-based large language model (LLM) platform designed to 1) quickly answer material science-related questions with high-quality references, providing concise, reliable, and high-granularity answers; and 2) generate a structured material database from a vast scientific corpus for material analysis and prediction. The platform capitalizes on DARWIN, an open-source fine-tuned LLM dedicated for natural science. The platform was built on Amazon Web Services (AWS), and provides an automated, user-friendly workflow for custom model development and data extraction. We proved that the platform can achieve remarkable accuracy with a small amount well-annotated articles. This innovative tool significantly accelerates the process of obtaining knowledge and data from material science papers and simplifies the transition from literature to structured knowledge and data, which promotes the development of materials informatics.

 

Categories