Topological Data Analysis - Part 1 (The Shape of Data)
I've been exploring TDA - or Topological Data Analysis lately. It's fascinating stuff. Below are some links and a simplified overview to provide an introduction to it - for non-experts; and non-data scientists. Apologies in advance to the true experts ;) if I mangle anything in the Attempt. You'll see many references to Ayasdi below - they're not the only game in town, but they've been doing this a while and have some great visualizations and YouTube tutorials, so have focused on them for most. Another very smart fellow I've worked with in this area is Joey Richards at Wise.IO.
One way to think about TDA, is as a tool to "distil" highly-dimensional data (lots of columns) down to something you can get your head around (like the visualization below). Done right, it can be used to see shape of data, and generate insights into really complex data sets. It's a great tool for exploration.
What's the definition?
Topological data analysis
- how one infers high-dimensional structure from low-dimensional representations; and
- how one assembles discrete points into global structure.
The human brain can easily extract global structure from representations in a strictly lower dimension, i.e. we infer a 3D environment from a 2D image from each eye. The inference of global structure also occurs when converting discrete data into continuous images, e.g. dot-matrix printers and televisions communicate images via arrays of discrete points.
What does it look like?
The branches / flares - and the color of the nodes based on a meaningful outcome, permit the user to understand what traits or shape is at play.
List of Links to Learn more: (work in progress, please suggest more)
- Professor Gunnar Carlsson Introduces Topological Data Analysis http://www.youtube.com/watch?v=XfWibrh6stw
- Marketing Intro http://www.youtube.com/watch?v=KVRx3r8qeQs#t=165
- Demo of Breast Cancer Data http://www.youtube.com/watch?v=F-C_B_Fmx7Q - (ESR1 - gene for estrogen expression vs survival)
- Dense Whiteboard talk http://www.youtube.com/watch?v=4RNpuZydlKY good info, but be ready to focus :)
I've also used these methods on analysis of Unified Communications data, using R, and it works just as well for comms data as it does for life science data.
About this blog
Description is...<br/>Data Analytics & Visualization Blog - Generating insights from Data since 2013
Created: July 25, 2014Englishfrançais