In the blogs below, we played around with the 'standard' IRIS Data Set; added our own new flavor of data ( Iris Mythica ) and then used a third party MLAAS (Machine Learning as a Service) platform to do some simple classification of the data.
Objectives / Key Steps:
Install R Programming Language Base
Install R Studio
Import CSV file
Install GGPLOT and Machine Learning library
Run Code to Visualize Data
Run Code to ML
Ok, I'm doing a fresh install on an old laptop running Windows 7 Home Premium OS laptop - and will be documenting my steps. Lots of super helpful communities if you want to google "how do I install R on ...."
STEP 1: Install R.
RStudio (next step) requires R 2.11.1 (or higher). If you don't already have R, you can download it here. http://cran.rstudio.com/ "This is what you want to install R for the first time." I downloaded R 3.0.2 EXE file (51mb)
Check it's running OK buy running and typing "demo(graphics)" - you'll know if install was good.
STEP 2: INSTALL R STUDIO >
"Powerful IDE for R - RStudio IDE is a powerful and productive user interface for R. It’s free and open source, and works great on Windows, Mac, and Linux". http://www.rstudio.com/ (you dont need R studio, but I've found things are lots easier with it)
I downloaded RStudio 0.97.551 - Windows XP/Vista/7/8
Check install and start to play around by typing "demo()" or "demo(image)"
Step 3: Go Get CSV file and R file
- R Code: https://drive.google.com/file/d/0BwjxYjWyopXhTjZxYWZVQ0QyeUM4cnAyX2lwckVNWjQ2VjJF/edit?usp=sharing
- CSV file of 200 data points. Should have Index, 4 attributes for each samples, and class(ificaiton)
PAY ATTENTION WHERE YOU SAVE THESE! Mucking around with Paths in R can be a very un-interesting time - suck. I put them in documents/R/mythica (both data and .R source)
Ok, now that's in
3b - OPEN the R file from console. Should look something like below.
Now change two things:
- first, the setwd directory must line up with where you put the file
- C:\Users\Home\Documents\R\mythica was mine ***** NOTE THE SLASHES ARE FORWARD SLASH
- second, change the file name to match your CSV file
- then.. put your curser over each line and it control enter, R console will execute each line. if you do good, 'summary(foo) should return a nice summary (not error
When I first started playing around with R, much of my 'lost time' was fussing around with PATH, Directories and File-Names - once that was set for the first demos, I never looked back!
summary(foo) will let you know if you got it right
Step 4: Install GGPLOT and Machine Learning library
if you try to run line (ctrl-enter) - "library(ggplot2)" - you will see Error in library(ggplot2) : there is no package called ‘ggplot2’ because you need to first INSTALL package, before you can use LIBRARY command in code
dont forget to go in and 'tick' the boxes in bottom right to activate (GGPlot and Random Forest)
Step 5: Run Code to Visualize Data
Step through code - you should start to see color
Step 6: Run Code to ML
(TO Come... (will post Code when I'm done it)
But if you enter
Should see > Type of random forest: unsupervised Number of trees: 500 No. of variables tried at each split: 2
About this blog
Description is...<br/>Data Analytics & Visualization Blog - Generating insights from Data since 2013
Created: July 25, 2014Englishfrançais