Machine Learning and Self Organizing Maps (SOMs) on Cancer Data

Now that I've got a few new tools in the toolbelt (like "Kohonen" library) - I went back to some older data and took another run at it.  Below is the cancer data. "breast cancer database was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg. He assessed biopsies of breast tumours for 699 patients up to 15 July 1992; each of nine attributes has been scored on a scale of 1 to 10, and the outcome is also known."

Its interesting to contrast the Kohonen clusters with the GINI Fit - contrasting V1 to V2.

The original posting here - https://dreamtolearn.com/ryan/data_analytics_viz/48 


coolBlueHotRed <- function(n, alpha = 1) {
  rainbow(n, end=4/6, alpha=alpha)[n:1]


cdata <- biopsy

## now swap out B and M for 1 and 0 (1 is malignant)

cdata$outcome <- as.character(cdata$outcome)
cdata[cdata == "m"] <- 1
cdata[cdata == "b"] <- 0
cdata$outcome <- as.numeric(cdata$outcome)
names(cdata)[1] <- "clmp_thicknss"
names(cdata)[2] <- "cell_uniformity"
names(cdata)[3] <- "cell_shp_unifrm"
names(cdata)[4] <- "marginal_adhesion"
names(cdata)[5] <- "sngl_elptcl_cell_sz"
names(cdata)[6] <- "bare_nuclei"
names(cdata)[7] <- "bland_chromatin"
names(cdata)[8] <- "normal_nucleoli"
names(cdata)[9] <- "mitoses"
names(cdata)[10] <- "Malignant"

data.sc <- scale(cdata)

data.som <- som(data.sc,  grid = somgrid(8, 4, "hexagonal"))
plot(data.som, palette.name = coolBlueHotRed, main = "Cancer Data - 699 Samples")

