Segmentation PART 1 - Reagan Classification Machine - An exploration of IBM Watson, Personality Insights, & Segmentation Methods -

Recommended By

  • Chris Tennant
POSTED IN: Building Bridges from R to IBM Watson

Election Time is upon us!  Once again, we will be hearing many presidential candidates reference Ronald Reagan fondly.  But who would the Gipper approve of?  Who is most like Ronald Reagan? 

Wouldn't it be wonderful if we could build a Reagan Classification Machine to help the candidates answer this question?   Well, we can!   ;)  

Below is a summary of Part 1 of a proof of concept for an IBM Watson Powered analysis that uses Cognitive Computing & Machine Learning to explore how a quarter million words of Speech content from ~60 major leaders of the 20th century - can yield 52 very interesting personality traits....

PPT DECK https://drive.google.com/file/d/0BwjxYjWyopXhN0g5bFdCb1FocGs/view


Overview & Objective

Exercise: An exploration of IBM Watson’s Personality Insights Service, and Exploration of Segmentation Methods using R Programming Language
Goal: Develop re-usable code for wider community to aid in exploration of, and experimentation with, post-processing “PI” data
Assumption: That the spectrum of personality attributes that PI produces can be used to discover actionable insights.
1.We believe that a Personality Insights analysis of ~100 major speeches of the 20thcentury, will illustrate traits and signal unique to the speaker –and that the information can be used by simple machine learning models(like random forest) to classify new test data and generate predictions with confidence. (PART 1) 


IBM WATSON –Personality Insights Service:

Personality Insights extracts and analyzes a spectrum of personality attributes to help discover actionable insights about people and entities, and in turn guides end users to highly personalized interactions. The service outputs personality characteristics: the Big 5, Values, and Needs
Personality Insights can also help with market segmentation and individualizing campaigns or promotions and can also be used to help recruiters or university admissions match candidates to companies or universities. Overall, Personality Insights individualizes and infers personality traits to drive a more tailored response and understanding  Overview: http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/personality-insights.html
Science Behind the Service:http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/personality-insights/science.shtml



  1. Located & Scraped ~100 (93 after scrubbing) famous speeches from the 20th century.  US Centric, English, several speakers with multiple speeches (FDR, LBJ, RR)
  2. IBM Watson Personality Insights service API's analyzed 258 thousand words and returned PI data for each speaker and each speech.
  3. Summarized information across 52 Dimensions
  4. Flagged Ronald Reagan and JFK for some Dimensional Reduction to look at similarities, differences, between each other and traits relative to other speakers - and also note how RR's 6 different speeches stack up to each other .  Noted the PI Traits that seemed to provide most signal/insight into RR
  5. Built a very simple POC of a Random Forest machine learning model in R, using R's RandomForest package.  Trained, tested. Noted results. Testing also included a couple of softballs of 100 Synthetic Reagans, and 1000 Reagan Wannabes - to see how well ML model performed at classification (got about half the Synthetic Reagans and nearly 100% of the Reagan Wannabes correctly classified)



Results and More Information:

Power Point Deck with Story & Images here:


CSV files here: https://drive.google.com/folderview?id=0BwjxYjWyopXhRWVmMThsX1ZFRjQ&usp=sharing

GITHUB: https://github.com/rustyoldrake/ronald_reagan_detecter - still developing


Interested in more content by this author?

About the Author

Ryan Anderson

Ryan Anderson

Hi! I like to play with data, analytics and hack around with robots and gadgets in my garage. Lately I've been learning about machine learning.

About this blog

This is an informal blog that explores tools, code and tricks that group members have developed to engage IBM Watson cognitive computing services - from the R Programming Language. Packages include RCURL to access Watson APIs - for services that include Natural Language Classifier and Speech to Text. THIS IS MY PERSONAL BLOG - it does not represent the views of my employer. Code is presented as 'use at your own risk' (it has lots of bugs)

Created: September 13, 2015


Up Next