poniedziałek, 28 grudnia 2015

Day 1 with R

What I have learned today:
> setwd("C:\R")  - set the directory I will put my files,
> getwd() check if it is set correctly

Let's get some data ...

I created a csv file with notepad:
Nazwisko,Wartosc,Wartosc2
Kowalski,23.22,32.12
Nowak,21.21,34.12
Sikorowski,20.2,11.3
Then I read it with R

> b <- read.csv ("test.csv") 

Play with charts:

> plot(b$Wartosc, b$Wartosc2)

> plot(density(b$Wartosc)) - basic density plot

> plot(boxplot(b$Wartosc)) - basic boxplot


Play with k-means clusters
clusters <- kmeans(b[-1],2)

Get data with cluster no. column
 > b <- cbind (b, clusters$cluster)

Plot x y with clusters distinguished by color:
> plot (b$Wartosc, b$Wartosc2, col=b[,3])

To add labels I tried:
> text(b$Wartosc, b$Wartosc2, labels=b$Nazwisko, cex= 0.7, pos=3)

Then I played with http://stats.stackexchange.com/questions/109273/creating-a-cluster-analysis-on-multiple-variables

> numbers_only <- b[c(-1,-4)]
> rownames(numbers_only) <- b$Nazwisko
> d <- dist(numbers_only, method="euclidean")
> fit <- hclust(d, method = "ward.D")
> plot(fit)

What if the tree is too large, and I want to generalize clusters. I need to cut a tree? http://stackoverflow.com/questions/6518133/clustering-list-for-hclust-function

> cutree(fit, h=10)
  Kowalski      Nowak Sikorowski
         1          1          2

Shit, it is 01:00 AM. Good night




Brak komentarzy:

Prześlij komentarz