Unlocking health data using natural language processing pipelines

Ms Farnoosh Sadeghian1, Dr Adam Morris1, Mr Jerico  Revote1, Ms Hilah Lerer1, Ms Anitha Kannan1

1Monash University, Clayton, Australia

Health-services host an enormous amount of information about patients, procedures, and more, but only a fraction of it is ever used for clinical decision-making or research. This is because the information is mostly stored as unstructured documents – precluding simple database searching and analysis, and often requiring manual inspection by staff.

We are working with a group of Australian health-service providers to deploy and develop a suite of open-source Artificial Intelligent (AI) and dashboard tools to unlock information from Electronic Medical Records (EMR) and other text data. The result is a Natural Language Processing (NLP) and analysis pipeline that can support clinical and research applications.

We have built a project team at Monash University which includes a data scientist, data engineer, devops engineer and business analyst who work closely with technical and clinical experts in health organisations. This collaboration is to 1) understand hospital data, 2) specify use cases that would most benefit from NLP and AI 3) deploy an open source ecosystem in the hospital environment, 4)  develop AI algorithms to use free text data and generate meaningful insights to health organisations.

Challenges in this process include getting access to patient data, working with a variety of infrastructure at each organisation, and finding a gap in research and clinical practices in each organisation that can be enhanced by NLP and AI as a Proof of Concept.

To date we have deployed the ecosystem in multiple health organisations, pre-processed 13 million EMR documents and annotated data using NLP methods.


Adam is a senior researcher and data specialist. He uses data analytics and modelling to make sense of complex systems, machine-learning and AI to predict outcomes from image, text, and numerical data, and automated pipelines for efficient and accurate data-science. His academic background is in computational and sensory neuroscience.

Farnoosh has a diverse background, from software engineering and Neuroimaging data analysis to data management and data linkage of sensitive data. She is currently part of multiple projects that involve providing improved environments for data processing to Australian researchers at eResearch and Helix Platform at Monash University.


Oct 14 2021


1:00 pm - 1:20 pm

Local Time

  • Timezone: America/New_York
  • Date: Oct 13 2021
  • Time: 10:00 pm - 10:20 pm