Dream to Learn is shutting down...

We are very sorry to say that Dream to Learn will be shutting down as of December 28th, 2019. If you have content that you wish to keep, you should make a copy of it before that date.


1COMMENTS1RECOMMENDS

Toying with Topological Data Analysis - Part 2 (Cancer Biopsy)

47
POSTED IN: Data Analytics & Visualization Blog

Objective:  Continue exploration of "basic" tools for dimension reduction and random forest to explore Topological Data Analysis (TDA)...

...but replaced IRIS Dataset (Part 1 below) with Cancer Biopsy Dataset

############################################
## Toying with Topological Data Analysis  .. ## Testing on   Biopsy Data on Breast Cancer Patients
## source : http://vincentarelbundock.github.io/Rdatasets/doc/MASS/biopsy.html
## Description This breast cancer database was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg. He assessed biopsies of breast tumours for 699 patients up to 15 July 1992; each of nine attributes has been scored on a scale of 1 to 10, and the outcome is also known. There are 699 rows and 11 columns.
#V1 clump thickness. #V2 uniformity of cell size. #V3 uniformity of cell shape #V4 marginal adhesion. #V5 single epithelial cell size. #V6 bare nuclei (16 values are missing). #V7 bland chromatin. #V8 normal nucleoli. #V9 mitoses.
#class = outcome >  "benign" or "malignant". (B or M)
############################################

Short version: Replacing IRIS data with biopsy data, and learning off of malignant / benign trait - showed good results.  The exploration continues..

LINK TO CODE HERE https://dreamtolearn.com/doc/25QTCXJLWHSSQ5X1XDYDIDNU0#  

Here is Data: https://docs.google.com/file/d/0BwjxYjWyopXhV3BqMUoteWhJRE0 (cleaned up) CSV
Here is Code: https://dreamtolearn.com/doc/25QTCXJLWHSSQ5X1XDYDIDNU0#

Here is a YouTube Video of us running through the code on our weekly hangout: http://www.youtube.com/watch?v=ckaaZl7kEvE

What we see:

Basic Plot of the 10 Dimensions 

 

Initial Diffusion Map on first 9 columns (promising)

 

What's important 

 

Dendogram from Random Forest.  I worry a bit about overfitting. (need to look into this) 

 

Coming at it from Inverse

 

And, FINALLY - displaying a bubble chart of the data.  In the end I opted NOT to go with D3 or GoogleFusion this time - rather GoogleVis library and gvisBubbleChart: 

 

 

Interested in more content by this author?

About the Author

Ryan Anderson

Ryan Anderson

Hi! I like to play with data, analytics and hack around with robots and gadgets in my garage. Lately I've been learning about machine learning.

About this blog

Description is...<br/>Data Analytics & Visualization Blog - Generating insights from Data since 2013

Created: July 25, 2014

Englishfrançais

Up Next