A Brief Introduction: Natural Language Processing at Genematics

by | Nov 3, 2017 | Technology

Natural language processing, known as NLP is becoming more importantly in modern day. We see its applications in mobiles, automotive and chatbots. The customer service you are talking to when you need support from a company, chances are it is a chatbot.  NLP is however, not limited to the mentioned areas. At Genematics Scientific for example, we use NLP in combination with word vectors to find semantic similar words in large amounts of scientific related documents.

Before we continue to jump on the bandwagon let me explain natural language processing a bit more. NLP is the processing of human language by giving the machine (read: computer with artificial intelligence software) words that we use every day to communicate. We can for example give it an English or Dutch vocabulary to start with. The processing of textual documents we provide the system involves analysis of our written languages in such way the machine understands the meaning of the given words. Understanding the meaning of these words can be achieved by looking at the position of each word in a given sentence. If we for example have the following sentence:

“My dog sleeps on the couch”

All words in the given sentence are connected in some way. My dog implies a dog can be mine, and sleeps tells us something about the status of the dog. If we have enough sentences we can train a machine to learn our language so it can recognize the meaning of the sentences and the words in contains.

At Genematics we are a huge fan of NLP, which may not surprise you as we develop software that mines, analyze and present data in such way it is easily interpretable for humans. Our classification techniques are in fact built on word vectors [Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean in Google Brain[2013]], which helps us training models on unsupervised data and therefore apply our classification on a broad spectrum of scientific topics. When an end-user uses our Genematics Cloud Platform and it enters a query in the search bar the underlying will automatically identify the context and subject of the query. An example what could be asked:

“Would ibuprofen be beneficial for emphysema patients?”

Our system quickly identifies it’s a medical related query as it recognizes:

  1. Ibuprofen (drug)
  2. Emphysema (disease)
  3. Patient (person)

Now we have identified the subject of the question we can pass through the information to the other modules of the Genematics Cloud Platform to process and present the predicted answer.

Photo by Hope House Press on Unsplash