Dream to Learn is shutting down...

We are very sorry to say that Dream to Learn will be shutting down as of December 28th, 2019. If you have content that you wish to keep, you should make a copy of it before that date.


1COMMENTS0RECOMMENDS

Toying with Topological Data Analysis - Part 1 (IRIS)

46
POSTED IN: Data Analytics & Visualization Blog

Objective:  Apply some of the tools of dimension reduction and random forest to explore Topological Data Analysis (dimension reduction) on a very simple data set (IRIS Mythica)

############################################
## Toying with Topological Data Analysis  ##
## Testing on IRIS Mythica - 200 samples  ##
## four types (3 standard + 1 fabricated) ##
## R. Anderson March 2014                 ##
############################################

Source Code: https://dreamtolearn.com/doc/EBWR7ILRB0Q39IJYVJEZPPT0F

 

Steps 1-4 - The source code 'maps' to the images below Item PLOT 1 is simply the scatterplot of IRIS Data.  What we're using here is an 'augmented' :) version of IRIS that has 4 groups of 50 = 200; Mythica is the invented set.  PLOT 2 shows diffusion of dist(scale(data)) # use Euclidean distance on data > PLOT 3 is generated after we run fit = randomForest(data, species, ntree=100, proximity=TRUE)  and PLOT 4 is actually at bottom of source code - uses plotBarcodeDiagram - to find our friend Betti

 

PLOT 5 is our Cluster Dendogram; PLOT 6 shows where our species ended up in the cluster (S=setosa; V=Versi; I = Virginica; M=mythica); Plot 7 is cluster, 4 bins; and PLOT 8 is 10 bins (Same data)

 

 We then export the data to CSV and to Google Fusion tables - https://www.google.com/fusiontables/DataSource?snapid=S125080841F- - Still a work in progress,

PLOT9 Shows the clusters chained together (10 bins) - unfortunately the Bubble size is not showing the number of points in each bin (as hoped: but rather the thickness of the band connecting reflects this;  PLOT10 is simply the count of samples in each of the 10 bins; and PLOT 11 is interesting it shows the connection of Species to Bin - if we had a perfect 'learn' this would simpler - but shows where there is overlaps, and BINS contain more than one species (not desired) or where some species run across multiple bins (this is OK)

 

Next stop - JSON - then will see about pushing D3 compatible data into a Force Directed Graph or similar.   Learning curve here we come!

Cheers

Ryan

Interested in more content by this author?

About the Author

Ryan Anderson

Ryan Anderson

Hi! I like to play with data, analytics and hack around with robots and gadgets in my garage. Lately I've been learning about machine learning.

About this blog

Description is...<br/>Data Analytics & Visualization Blog - Generating insights from Data since 2013

Created: July 25, 2014

Englishfrançais

Up Next