The arrival of natural language processing in healthcare

“Writing in English is like throwing mud at a wall.”
― Joseph Conrad

I recently watched the 2016 movie “Arrival.” The film explores the idea that what you think and how you think may actually be closely intertwined. “Arrival” is a story about humanity’s first contact with aliens and how a pair of scientist find ways to communicate without a common language. As they spend more and more time with the octopus-like creatures, they get increasingly frustrated with their lack of progress and must get creative in order to effectively communicate with these new visitors to Earth. I won’t spoil it for you, but this film beautifully illustrates how powerful and difficult the use of language can be, whether it’s between a linguist and a 10-foot-tall mollusk or with each other.

In a way, Electronic Medical Records (EMRs) can be seen as big repositories of human language about patients. Patient charts are largely written observations, results, and decisions that can be done with pen and paper or, now, with EMRs – completed with a computer keyboard. Depending on the size of the practice or hospital, this means the creation of hundreds or thousands of notes a day, including Progress Notes, Admission Notes, Procedure Notes, Discharge Summaries, Consultation Notes, and more.

These narrative clinical notes are largely entered by providers, nurses, and the rest of the care team essentially writing sentences and paragraphs. These notes are the bread and butter of patient documentation - it’s natural for medical professionals to record what they’re thinking or discovered in the same manner in which they would tell a colleague in conversation. The design of EMRs did not spring fully formed out of thin air. Like most innovations in technology, these systems were designed predominantly as charge capture repositories, but with the intention of making some improvements the day-to-day workflows of doctors, nurses, and other patient care professionals. If paper charts hadn’t existed before the invention of EMRs, they may have been designed differently to better leverage computers and perhaps better organize the intake and recording of patient data. But, for the foreseeable future, narrative documentation isn’t going anywhere.

Humans can obviously read and execute decisions based on individual notes, but the manual time and effort to do so is drastically inefficient. One study found that when hospitalists were reviewing notes, much of the content received little attention or was read very quickly. But, even if providers and care teams could ingest these notes with 100% accuracy and speed, one of the heralded advantages of EMRs has always been that patient data would be digital so computers and their ever evolving algorithms can ingest and interpret them.

As a result, we’ve seen entire new industries spring up and succeed in doing just that, such as via Population Health reporting tools to analyze hundreds of patients at a time, or Clinical Decision Support tools that provide recommendations at the point of care - just to name a few. However, up until recently, the bulk of this kind of data analysis has been limited to the relatively small amount of discrete data (approximately 20%) found in the EMR, which are typically entered in specific, discrete formats, such as the required selection of specific ranges of numbers for vitals or laboratory data fields.

One of the most exciting areas of recent innovation in healthcare technology has been in the field of “Natural Language Processing,” or NLP, which uses cognitive computing algorithms to allow a computer to “read” unstructured text and pick out key words and phrases, in context to “understand” its meaning. This allows computers to tap into the vast, previously unexplored swaths of note data that is simply unreadable by standard tools limited to ingesting only discrete data. Not surprisingly, a vast amount of useful clinical data is found in progress notes, nursing notes, and other free text notes that are not redundantly also documented in discrete fields. And, with Artificial Intelligence, not only can NLP be used to extract written out thoughts and findings about patients, but it can be leveraged to find patterns and run analysis to lead to discoveries that the doctor or nurse didn’t even realize when writing those notes! All this at lightning speed compared to any cost-prohibitive, manual attempts to do this work by hand.

NLP can be used in a variety of applications, such as helping advertisers read social media posts to improve their ad targeting or to help a computer compete in Jeopardy!. And there have even been successes in blending the line between social data and clinical data, such as a study that combined tweets about asthma with data taken from air-quality sensors and EMR data to predict with 75% accuracy if the Parkland emergency department staff could expect a high, low or medium number of asthma-related visits that day. NLP is being used by many young, innovative companies as a powerful tool to help provide real-time, personalized clinical decision support to identify for medical risks like sepsis or COPD, before they occur. This is an emerging and exciting area of study, and the ability and accuracy of NLP combined with machine learning will likely become ever more powerful and beneficial to uncover hidden gems of insight from patient data that was previously unable to be explored.

Yes, language (especially the English language) can be messy, but with the emergence of NLP there’s hope that computers are starting to bridge the gap to a better understanding of human language, which will allow humans to better understand each other and improve patient care.

Dave PaigeMarch 8, 2018