NLP AI healthcare unstructured data

Natural Language Processing is the New Revolution in Healthcare AI

After a late start, Artificial Intelligence (AI) has made massive inroads in penetrating the pharmaceutical and healthcare industries. Nearly all aspects of the industry have felt impacts from the increasing uptake of the technologies – from visual and image diagnostics, to Machine Learning and Neural Networks for drug discovery, data analysis and other models. One of the more recently ascendant forms of AI in healthcare has been Natural Language Processing (NLP) – which promises to revolutionize digitization, communication and analyzing human-generated data. We explore the foundations of the technology and its current and future applications in the industry.

What is Natural Language Processing?

NLP refers to the practice of using computer methods to process language in the form generated by humans – often disordered, flexible and adaptable. The phenomenon is not limited to AI technology; it first originated in the 1950s with symbolic NLP. This was a rudimentary concept which was originally intended for machine translation applications – first demonstrated by the 1954 Georgetown Experiment. Symbolic NLP was largely founded on complex, manually defined rulesets: it soon became apparent that the unpredictability and fluidity of human language could not truly be defined by concise rules. 

With the exponential growth in computing power, symbolic NLP soon gave rise to statistical NLP – largely pioneered by IBM and their Alignment Models for machine translation. IBM developed six such models, which enabled a more flexible approach to machine-based translation. Other companies soon followed, and statistical NLP evolved into what we know today as self-learning NLP, underpinned by machine learning and neural networks. The developments in its ability to recognize and process language have put it to use in fields far more diverse than translation – although it continues to make improvements there too. 

While symbolic NLP is still often employed when datasets are small, statistical NLP has been largely replaced by neural network-enabled NLP. This is due to how neural networks can simplify the process of constructing functional models. The trade-off lies in the opacity of how they operate – while statistical methods will always be fully transparent and the path to how they obtain their results will be fully visible, neural network models are often more of a “black box”. Their power in interpreting human language is not to be underestimated however – from speech recognition, including the smart assistants we have come to rely on, to translations and text analytics, NLP promises to bridge many gaps.

Current Models in Healthcare

One of the most obvious applications of NLP in the healthcare industry is processing written text – whether that be analog or digital. A leading source of data heterogeneity, which often prevents downstream analysis models from directly utilizing datasets, is the different terminology and communication used by healthcare practitioners. Neural-enabled NLPs can condense such unstructured data into directly comparable terms suitable for use in data analysis. This can be seen in models inferring International Classification of Diseases (ICD) codes based on records and medical texts.

Medical records present rich datasets that can be harnessed for a plethora of applications with the appropriate NLP models. Medical text classification using NLPs can assist in classifying disease based on common symptoms – such as identifying the underlying conditions causing obesity in a patient. Models such as this can then be later used to predict disease in patients with similar symptoms. This could prove particularly revolutionary in diseases such as sepsis – which has early symptoms that are very common across a number of conditions. Recurrent Neural Network NLP models using patient records to predict sepsis showed a high accuracy with lower false alarm rates than current benchmarks. 

These implementations are also critical in clinical development. But clinical operations also generate another source of unstructured data: adverse event reports, which form the basis of pharmacovigilance and drug safety. We already explored the applications of AI models in pharmacovigilance in a different article – but their introduction to that field particularly highlights the need for increased cooperation with regulatory authorities to ensure all stakeholders remain in lockstep as we increasingly adopt AI. 

Beyond Healthcare

But NLP can also be applied beyond human language. Exploratory studies have also shown its potential in understanding artificial languages, such as protein coding. Proteins, strings of the same 20 amino acids presenting in variable order, share many similarities with human language. Research has shown that language generation models trained on proteins can be used for many diverse applications, such as predicting protein structures that can evade antibodies. 

There are other sources of unstructured, heterogeneous data that are simply too big to pore over with human eyes in cost-efficient manners. Science literature can be one of these – with countless journal articles floating around libraries and the web. NLP models have previously been employed by companies such as Linguamatics to capture valuable information from throughout the corpus of scientific literature to prioritize drug discovery efforts.

Quantum computing also represents a major growing technology, and firms are already seeking to combine it with NLP. One example is Quantinuum’s λambeq library, which can convert natural language sentences to quantum circuits for use in advanced computing applications. Future endeavors in the area promise massive advancements in text mining, bioinformatics and all the other applications of NLP.

Research conducted by Gradient Flow has shown that Natural Language Processing is the leading trend in AI technologies for healthcare and pharma. This is for good reason – AI can prove useful in a cornucopia of different implementations, but NLP is what facilitates the use of heterogeneous, fragmented datasets in the same model. Integrating existing and historical datasets, or new datasets generated in inherently unstructured manners – articles, records and medical observations, will remain crucial in the progress of other AI technologies. NLP is what enables that – and future advancements are likely to see its prominence rise on its own, as well.

Author: Nick Zoukas

Source: PharmaFeatures