New to Dream to Learn? Check out our Quick Start guide!  

0COMMENTS0RECOMMENDS

ICD-10 Healthcare Codes and WIld Pigs - An Experiment with IBM Watson Natural Language Classifier

16
POSTED IN: Building Bridges from R to IBM Watson

OINK!  Bitten by a wild pig in the left leg !

How would you classify this injury using ICD-10 codes?  Below is a proof of concept to explore using the IBM Watson NL Classifier and codes as Ground Truth - to help nurse/doctor narrow down to most likely codes.  Additionally could score on most likely to be reimbursed by insurance company....

    class           confidence
 1:   W55  0.33606352760420066
 2:   M79  0.21942966846914216

 

Background

 

ICD-10 is the International Statistical Classification of Diseases and Related Health Problems 10th Revision.  It's a REALLY long list of codes that hospitals and insurance companies use to classify treatments.

It's 65k+ rows of ailments.  It's a big and interesting data set.

At a recent IBM Watson team summit, the data set came up in a conversation with a colleague.  I was curious if we could create a QUICK AND DIRTY natural language classifier that would take a speech or text input, and output the top 10 most likely matches.

Benefits of such a system would include:

  • More accurate tagging of PRIMARY and SECONDARY ICD 10 Codes - better data for all the reasons that the codes exist in the first place (better fit)
  • More profitability for hospitals (assuming better tagging can translate to more successful claims from insurance companies)
  • Save Time - less rework and re-classification

    WIKI:  https://en.wikipedia.org/wiki/ICD-10#List

 

SOURCE CODE

CAVEAT:  THIS GROUND TRUTH WAS PREPARED IN 30 MINUTES - IT IS NOT A SOLUTION, BUT RATHER AN EXAMPLE OF ONE PART OF A WIDER APPROACH.

https://github.com/rustyoldrake/IBM_Watson_NLC_ICD10_Health_Codes

BLOG: https://dreamtolearn.com/ryan/r_journey_to_watson/16

 

 

 

ICD-10 Organization

Here's what it looks like at a meta level:

Chapter    Blocks    Title
I    A00–B99 Certain infectious and parasitic diseases
II    C00–D48 Neoplasms
III    D50–D89 Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism
IV    E00–E90 Endocrine, nutritional and metabolic diseases
V    F00–F99 Mental and behavioural disorders
VI    G00–G99 Diseases of the nervous system
VII    H00–H59 Diseases of the eye and adnexa
VIII    H60–H95 Diseases of the ear and mastoid process
IX    I00–I99 Diseases of the circulatory system
X    J00–J99 Diseases of the respiratory system
XI    K00–K93 Diseases of the digestive system
XII    L00–L99 Diseases of the skin and subcutaneous tissue
XIII    M00–M99 Diseases of the musculoskeletal system and connective tissue
XIV    N00–N99 Diseases of the genitourinary system
XV    O00–O99 Pregnancy, childbirth and the puerperium
XVI    P00–P96 Certain conditions originating in the perinatal period
XVII    Q00–Q99 Congenital malformations, deformations and chromosomal abnormalities
XVIII    R00–R99 Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified
XIX    S00–T98 Injury, poisoning and certain other consequences of external causes
XX    V01–Y98 External causes of morbidity and mortality
XXI    Z00–Z99Factors influencing health status and contact with health services
XXII    U00–U99 Codes for special purposes

 

 

TOP TEN CLASSES (Taken from the 50 returned from the 5 X 10 functions)

 watson.query.five.classifiers("bitten by a wild pig in the left leg")
    class           confidence
 1:   W55  0.33606352760420066
 2:   M79  0.21942966846914216
 3:   S90  0.20187063115504736
 4:   S72   0.1867607490106112
 5:   M02   0.1849542951462859
 6:   R62  0.18333693860407277
 7:   Z63  0.14295376020063674
 8:   S71  0.13195078900692148
 9:   T16  0.11516054234192707
10:   S59  0.09322628449385714

 

 Code Snippet Below

rest of code here: https://github.com/rustyoldrake/IBM_Watson_NLC_ICD10_Health_Codes

######################################################
### Experimental Code.  Experimental R Interface for IBM Watson Services -
### Focus: Natural Language Classifier - R Programming Language Interface
### Playing with ICD Codes - multiple classifiers - proof of concept (not optimized, and no pre- or post- classifiers in this example)
######################################################


library(RCurl) # install.packages("RCurl") # if the package is not already installed
library(httr)
library(XML)
library(data.table)
library(reshape2)
library(tidyr)
library(dplyr)
library(stringr)
library(splitstackshape)
 

# ICD DEMO - Ryans' Service - will delete Jan 31 2016
username = "c18631c8-ba0c-47e6-YOUR-USERNAME"
password = "YOUR_PASSWORD"
username_password = paste(username,":",password)

######### Housekeeping And Authentication
setwd("/Users/ryan/Documents/Project ICD-Codes")
getwd()
base_url = "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/"
getURL(base_url,userpwd = username_password )


###### FUNCTION CREATE NEW CLASSIFIER - post /v1/classifiers - Creates a classifier with CSV data ## URL below no "/" after base url
watson.nlc.createnewclassifier <- function(file,classifiername) {
  return(POST(url="https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers",
         authenticate(username,password),
         body = list(training_data = upload_file(file),
                     training_metadata = paste("{error: no user with username '\"language\":\"en\",\"name\":",classifiername,"' was found}",sep="")
         )))}
###### end of function

###### FUNCTION - CHECK CLASSIFIER STATUS
watson.nlc.checkclassifierstatus <- function(classifier_id) {
  return(
    getURL(paste(base_url,classifier_id,sep=""),userpwd = username_password)
  )
}
### end of function


###### FUNCTION - DELETE CLASSIFIER - Receives name of Classifier to Kill; May not be able to do this until training complete
watson.nlc.deleteclassifier <- function(kill_classifier) {
  DELETE(url=(paste(base_url,kill_classifier,sep="")),authenticate(username,password))
}
 
### end of function

###### FUNCTION: ACCEPT QUERY & RETURN RESULT: CLASSIFIER and % FROM TEXT INPUT AND PROCESS TO LOOK GOOD
watson.nlc.processtextreturnclass <- function(classifier_id,query_text){
    query_text <- URLencode(query_text)
    data <- getURL(paste(base_url,classifier_id,"/classify","?text=", query_text,sep=""),userpwd = username_password)
    data <- as.data.frame(strsplit(as.character(data),"class_name"))
    data <- data[-c(1), ] # remove dud first row
    data <- gsub("[{}]","", data)
    data <- gsub("confidence","", data)
    data <- data.frame(matrix(data))
    setnames(data,("V1"))
    data$V1 <- gsub("\"","", data$V1)
    data$V1 <- gsub(":","", data$V1)
    data$V1 <- gsub("]","", data$V1)
    data <- cSplit(data, 'V1', sep=",", type.convert=FALSE)
    setnames(data,c("class","confidence"))
  return(data) }
### end of function
 
###### FUNCTION: LIST ALL CLASSIFIERS AND RETURN NEAT LIST
watson.nlc.listallclassifiers <- function(){
  data <- getURL(base_url,userpwd = username_password )
  data <- as.data.frame(strsplit(as.character(data),"classifier_id"))
  data <- data[-c(1), ] # remove dud first row
  data <- data.frame(matrix(data))
  colnames(data) <- "V1"
  data$V1 <- gsub("[{}]","", data$V1)
  data$V1 <- gsub("]","", data$V1)
  data$V1 <- gsub("\"","", data$V1)
  data$V1 <- gsub("name:","", data$V1)
  data$V1 <- gsub(":","", data$V1)
  data <- cSplit(data, 'V1', sep=",", type.convert=FALSE)
  data[,c(2,4)] <- NULL
  data <- as.data.table(data)
  setnames(data,c("classifier","name","date_created"))
  data <- data[order(date_created),]
  return(data)
}

##### ACTION: EXECUTE FUNCTION  TO KILL (!!!) DELETE (!!!) CLASSIFIER - WARNING
watson.nlc.listallclassifiers()  # inventory - what do we want to delete - classifier id
kill <- "563C46x19-nlc-382"
watson.nlc.deleteclassifier(kill)
watson.nlc.listallclassifiers()  # check it's gone
## More NLC API DOCS here: https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/natural-language-classifier/api/v1/#authentication

 

 

 

Before you can comment, you need to sign-up or login

About the Author

Ryan Anderson

Ryan Anderson

Hi! I like to play with data, analytics and hack around with robots and gadgets in my garage. Lately I've been learning about machine learning.

About this blog

This is an informal blog that explores tools, code and tricks that group members have developed to engage IBM Watson cognitive computing services - from the R Programming Language. Packages include RCURL to access Watson APIs - for services that include Natural Language Classifier and Speech to Text. THIS IS MY PERSONAL BLOG - it does not represent the views of my employer. Code is presented as 'use at your own risk' (it has lots of bugs)

Created: September 13, 2015

English

Up Next