Miss Priti Priti1
1BRIC-National Institute of Plant Genome Research, New Delhi, India
Biography:
My name is Priti. I am a PhD student at BRIC-NIPGR, New Delhi, working on building AI to decipher the chemical Language of plants. I am working in the Biodiversity Informatics laboratory of NIPGR under the supervision of Dr. Yadav, and my thesis focuses on Mapping the Phytochemical Landscape using Machine Intelligence on Literature Metadata
Abstract:
Plants constantly synthesize diverse metabolites, each playing a distinct role in defense or communication, collectively now considered as unique “chemical spectra-specific fingerprints.” Bioprospecting for plant secondary metabolites has multifaceted applications, apart from helping to unravel the functions and significance of such metabolites. However, a significant wealth of information on natural products research remains locked in the vast repository of published data spanning centuries due to copyrighted scholarly records, hindering accessibility and integration across scientific domains.
We are curating and normalising plant-chemistry n-grams from 107 million research articles into a searchable database. In all, we have identified 617,263 chemical-plant pairs, sourced from 383,793 unique DOI numbers, featuring 6,647 plants and 1,725 chemicals. Additionally, available data on the physicochemical properties of chemical compounds and plant taxonomy have been incorporated, enhancing data accuracy. Integration with Wikidata IDs has enabled embedding this knowledge into the “Global Knowledge Graph,” strengthening accessibility and inviting collaborative data curation within the community.
We have also searched the 350K plant species in OpenAlex, resulting in 250K plant species and 2.5M DOIs. We will perform Named Entity Recognition using SpaCy to extract chemical entities. This will be further enhanced by mapping extracted entities to external identifiers such as PubChem, MeSH, GBIF, and the Catalogue of Life, ensuring semantic interoperability.
We aim to seamlessly integrate biology and chemistry by profiling the metabolic diversity of all known plants and openly sharing this data. This initiative has the potential to revolutionise natural products research and broaden its applications beyond the field.