Dream to Learn is shutting down...

We are very sorry to say that Dream to Learn will be shutting down as of December 28th, 2019. If you have content that you wish to keep, you should make a copy of it before that date.


1COMMENTS0RECOMMENDS

Topological Data Analysis - Part 1

06
POSTED IN: Data Analytics & Visualization Blog

Topological Data Analysis - Part 1  (The Shape of Data)

I've been exploring TDA - or Topological Data Analysis lately.   It's fascinating stuff.    Below are some links and a simplified overview to provide an introduction to it - for non-experts; and non-data scientists.  Apologies in advance to the true experts ;) if I mangle anything in the Attempt.     You'll see many references to Ayasdi below - they're not the only game in town, but they've been doing this a while and have some great visualizations and YouTube tutorials, so have focused on them for most.     Another very smart fellow I've worked with in this area is Joey Richards at Wise.IO

One way to think about TDA, is as a tool to "distil" highly-dimensional data (lots of columns) down to something you can get your head around (like the visualization below).  Done right, it can be used to see shape of data, and generate insights into really complex data sets.  It's a great tool for exploration. 

What's the definition?

Topological data analysis

 

Topological data analysis (TDA) is a new area of study aimed at having applications in areas such as data mining and computer vision. The main problems are:

  1. how one infers high-dimensional structure from low-dimensional representations; and
  2. how one assembles discrete points into global structure.

The human brain can easily extract global structure from representations in a strictly lower dimension, i.e. we infer a 3D environment from a 2D image from each eye. The inference of global structure also occurs when converting discrete data into continuous images, e.g. dot-matrix printers and televisions communicate images via arrays of discrete points.

What does it look like?

     

The branches / flares - and the color of the nodes based on a meaningful outcome, permit the user to understand what traits or shape is at play.

 

List of Links to Learn more: (work in progress, please suggest more)

 

I've also used these methods on analysis of Unified Communications data, using R, and it works just as well for comms data as it does for life science data.

 

Interested in more content by this author?

About the Author

Ryan Anderson

Ryan Anderson

Hi! I like to play with data, analytics and hack around with robots and gadgets in my garage. Lately I've been learning about machine learning.

About this blog

Description is...<br/>Data Analytics & Visualization Blog - Generating insights from Data since 2013

Created: July 25, 2014

Englishfrançais

Up Next