Segmentation and Self Organizing Maps - Organizing and Interpreting IBM Watson Data (Part 1)

POSTED IN: Building Bridges from R to IBM Watson

Segmentation and Value Creation

Despite the grey hairs starting to appear in my bay-area-beard, I still love to play!   Often I use IBM Watson services as my "Cognitive Lego" - because the modules help me take a sliver of an idea, expand and augment data sources, and then explore how it might be useful.   Sometimes i can do this in a couple of hours.   With the tools, and a bit of enthusiasm, I can create an "artifact" - that I can show to my colleagues and our partners - and start an interesting conversation.  

Sometimes the artifacts are mere curiosities, and often they are failures.   But sometimes....

... sometimes they become something more meaningful.  I use a few different metaphors to describe these little artifacts - a spark - a cognitive catalyst - or a "seed crystal".  The common theme is that the artifacts STIMULATE some new action - for someone who already has all of the other pieces ready in a supersaturated solution - and is just waiting for a one tiny grain - to start the chemical reaction. 

I've seen this happen with a dozen IBM partners - and while my little piece is often a tiny part of their overall success - it is exciting and immensely gratifying.  

Seed Crystal:  Democratizing Data Discovery - specifically in the area of User Segmentation and Message Resonance 

One of these seed crystals relates to data discovery and user segmentation.

To be clear, what is below is nothing fancy.   To expert data scientists, stats experts, and marketing analyst gurus - this may be of little interest.  It's pretty blunt code, and rudimentary methods.  For the sake of speed, I often cut corners, fake data, and I do things with stats that probably make Stats Professors cringe. 

But that's OK.

Because I'm not publishing papers, or running experiments, or doing a PHD.   I'm trying to stimulate thought and bring something (albeit imperfect) that is fresh and new - to people that may not have been exposed to the ideas. 

My audience includes the generalists, the strategists, the tinkerers and the builders - and all the other people out there who like to play and explore what might be possible.

Now don't get me wrong - there are no shortcuts in a scientific and methodical approach to - ultimately - a validated experiment.   Clear KPIs, yardsticks for performance, and ability to talk ROI.  But that comes later.  For now, we want to get people thinking.  Divergent thinking.  Creative thinking.

We want to use our little seed crystal to help people explore the ART OF THE POSSIBLE!


The Value of User Segmentation

Much of IBM Watson's value is about taking unstructured data (text, tweets, SMS messages, chat-bot scripts, stories, resumes, news articles, speech audio, pictures and images) and SURFACING SIGNALS about what is inside the data.  It's like a little oil refineries - cracking the data-carbon.

In doing this, IBM Watson Services help people LISTEN to the data.  And to SEE new things.


Once we begin to listen - and tune-in to the signals that matter, we begin to hear, understand and interpret.  This is - as one of my business school professors eloquently put it - SENSEMAKING,


Election Example - Independent Swing Votes

Let's consider an example.   Taking some synthetic (fake) data  ;) - lets consider a US Presidential Election scenario - and a data sample of eligible voters in a swing state.   We have data from them, that includes but is not limited to social media (facebook, twitter).  Data may include samples from more direct methods like telephone surveys or face to face:


On the left, in the nine-pie-array - each bucket represents approximately 1000 people grouped alongside folks who most closely share their traits.  Heatmap on right tells how many people in each bucket.

For a media team running a political/marketing/social-media campaign - This very simple visualization above takes an array of half a million data points, from 100k imaginary people - and shows us how we can start to segment the population into bite size pieces.   We can start to give the groups names like "Most Likely Swing Voters" and think about where we might want to spend our money/time, and how we may adjust our strategies. (i.e. the "actionable intelligence")




KOHONEN PACKAGE: https://cran.r-project.org/web/packages/kohonen/kohonen.pdf


Moving Towards Actionable Intelligence

Now that we've done this - the imaginary political team can start to think about things a little differently.

Have DATA DRIVEN conversations about strategy.   And more importantly, we can shift from a divergent-creative thinking approach - to a more convergent-methodical approach - where we can develop hypothesis and test them experimentally.  


In the case above, an experiment might be for the 18% of swing state voters in bucket #3 with high-amplitude 'silent majority signal' - i.e. not expressing views on A B and C, we believe direct messaging of X by channel Y over 7 days will result in Z% positive sentiment changes towards our candidate.  


From Insight to Action - So Who Cares?!?

The people that typically are interested in this information - and acting on it - generally fit the model that they are (a) spending money or time to reach out to the wider world and (b) trying to listen, understand and then influence positively - the hearts and minds of a subset of the population.

  • Fortune 500 Companies - Strategy Teams, Brand Managers and Marketing Teams . And KPIs for CXO office.

  • Social Media Marketing Teams - KPIs tie to efficiency of each marketing dollar spent

  • Political Campaigns and Political Action Committees (PACs)

  • Internal Company Management / Strategy and HR - large companies seeking a better understanding of workforce (voice of employee)



So briefly - the example and illustrations above are intended to

  • Provide a visual introduction to Self Organizing Maps - and understand where they can be useful in making sense of large, heterogeneous data sets
  • Suggest an approach of data discovery for teams in a divergent-creative mode; and
  • Proposed steps to develop a method by which users can also transition to a convergent-methodical mode
  • Show why this method of organizing data for large populations, can result in improved understanding, better decisions, and actionable intelligence with measurable outcomes.


Sources and for more Reading

  1.  R Package - Kohonen Self Organizing Maps - Functions to train supervised and self-organising maps (SOMs). Also interrogation of the maps and prediction using trained maps are supported. The name of the package refers to Teuvo Kohonen, the inventor of the SOM.  https://cran.r-project.org/web/packages/kohonen/kohonen.pdf
  2.  R Package Diffusion Map - Implements diffusion map method of data parametrization, including creation and visualization of diffusion map, clustering with diffusion K-means and regression using adaptive regression model. http://www.stat.berkeley.edu/~jwrichar/software/diffusionMap-manual.pdf
  3.  R Scripts for IBM Watson - code - https://github.com/rustyoldrake/R_Scripts_for_Watson
  4.  Blog Examples of other things https://dreamtolearn.com/ryan/r_journey_to_watson/ and https://dreamtolearn.com/ryan/data_analytics_viz
  5.  LIST of Links to Other Watson Resources - https://dreamtolearn.com/ryan/r_journey_to_watson/13
  6. Sensemaking - https://en.wikipedia.org/wiki/Sensemaking











Interested in more content by this author?

About the Author

Ryan Anderson

Ryan Anderson

Hi! I like to play with data, analytics and hack around with robots and gadgets in my garage. Lately I've been learning about machine learning.

About this blog

This is an informal blog that explores tools, code and tricks that group members have developed to engage IBM Watson cognitive computing services - from the R Programming Language. Packages include RCURL to access Watson APIs - for services that include Natural Language Classifier and Speech to Text. THIS IS MY PERSONAL BLOG - it does not represent the views of my employer. Code is presented as 'use at your own risk' (it has lots of bugs)

Created: September 13, 2015


Up Next