Are Large Language Models Ready for Materials Science?

Mr. Tong Xie1,2, Mrs. Yuwei Wan2, Mrs. Yixuan Liu2, Dr. Dongzhan Zhou2, Prof. Wenjie Zhang1, Prof. Bram Hoex1

1University of New South Wales, Kensington, Australia, 2GreenDynamics, Sydney, Australia

Biography:

Tong Xie is a PhD at the School of Photovoltaic and Renewable Energy Engineering (SPREE), UNSW Sydney, acclaimed as one of Australia’s National Computational Infrastructure’s Top 10 HPC AI-Talents. As the CEO of GreenDynamics and the Group Lead of UNSW AI4Science, he is pioneering the use of Generative AI to accelerate the discovery and development of sustainable materials. His expertise extends to Natural Language Processing and Material Science. He also founded the DARWIN natural science language model, demonstrating his innovative approach to advancing AI in material sciences.

Abstract:

The transition to AI-powered automation has become a pivotal focus in recent years, with large language models (LLMs) revolutionizing various domains. This study explores the utilization of LLM fine-tuning for non-language downstream tasks in material science and investigates diverse training strategies to enhance performance.

We propose a novel approach that incorporates structured and unstructured scientific data from public datasets and literature into open-source models. The Scientific Question Answering Generation (SciQAG) model automates the generation of instructions from scientific texts, efficiently extracting knowledge without relying on manual extraction or domain-specific knowledge graphs. Additionally, we investigate multi-task training strategies that leverage the interdisciplinary nature of materials science, demonstrating superior predictive performance compared to single-task training.

Extensive experiments on 23 scientific tasks relevant to materials science, including semiconductors, polymers, metal-organic-framework and so on, show that our LLMs achieve state-of-the-art performance, surpassing existing baselines. By relying on open-source models, our approach promotes transparency and reproducibility in the scientific community.

The implications of this research extend beyond materials science, as the methodology can be adapted to other scientific domains. We aim to inspire further research and development in AI for science, enabling researchers to leverage LLMs to tackle complex scientific challenges and drive innovation in materials science and beyond.

 

Categories