Dream to Learn is shutting down...

We are very sorry to say that Dream to Learn will be shutting down as of December 28th, 2019. If you have content that you wish to keep, you should make a copy of it before that date.


0COMMENTS0RECOMMENDS

Basic Machine Learning (Classification) on 'new' Iris Data Set (Wise.IO)

18
POSTED IN: Data Analytics & Visualization Blog

In the Blog post below, just for grins I added (fabricated) another 50 Data Elements called Iris Mythica.

Now, I wanted to run some basic Machine Learning on the "new" Data, to see how it performed. 

There are a number of great tools out there - including a package for R   - however for my example here, I'm going to use the Machine Learning engine from Wise.IO - because I'm familiar with it and know the guys there.   Given power of the Wise tools, it's way more horsepower than needed, but what the heck!  Coming soon I'll step through the R solution.

Goal:  Feed a PORTION of the data into an ML tool.  Let tool learn.  Then, test/validate the tool by submitting data for classification (blind) to see what the model thinks data is.   Then remove the blindfold, cross-check results, and see how good the tool was..   (spoiler alert: pretty good)

Steps:

  1. GET the CSV file of 200 data points.  Should have Index, 4 attributes for each samples, and class(ificaiton)
  2. SEPARATE LEARN/TEST - we're not going to give the ML tool all data, just a big chunk.  We'll hold some back to test the model later, and see how smart it is.  Method: I simply used =RAND() and then told the spreadsheet to 'tag'as TEST any row that was more than 0.8.  
    1. Ended up with 153 learn (CSV) and then held back 47 to test (CSV, removed class info)  
  3. SUBMIT learn data - 153 rows - to ML model to learn.  In this case, I told the model to ignore the Index (1-200 incremental); use the 4 attributes, and classify on class (setosa, versicolor, virginica and "mythica")
  4. MACHINE LEARNING (magic :) - more to come on methods here, next blog; Model successful; 
  5. PREPARE 'Test" data - strip out the 'class' tag on the 47 data elements, because that's what we want the ML to predict for us.  Upload.  Ask model to predict what the class is.
  6. RECEIVE file with classification.  Examine confidence (most were above 90%).  
  7. COMPARE the predictions, to actual - remove blindfold - CSV Here
    1. (model got 46 out of 47 correct with default settings, and high confidence) - so in summary, good outcome.
    2. Error - classified row 120, test data with attributes   6    2.2    5    1.5 as Versicolor (it's Iris-virginica) - noted later that this data was the lowest value from test group for 3 of 4, and second lowest for other. (was near the edge of cluster)
  8. CONCLUSION:  Both the Data Set and Tool performed well.  Tool classified 46 of 47 test samples correctly.

Interested in more content by this author?

About the Author

Ryan Anderson

Ryan Anderson

Hi! I like to play with data, analytics and hack around with robots and gadgets in my garage. Lately I've been learning about machine learning.

About this blog

Description is...<br/>Data Analytics & Visualization Blog - Generating insights from Data since 2013

Created: July 25, 2014

Englishfrançais

Up Next