> setwd("C:\R") - set the directory I will put my files,
> getwd() check if it is set correctly
Let's get some data ...
I created a csv file with notepad:
Nazwisko,Wartosc,Wartosc2Then I read it with R
Kowalski,23.22,32.12
Nowak,21.21,34.12
Sikorowski,20.2,11.3
> b <- read.csv ("test.csv")
Play with charts:
> plot(b$Wartosc, b$Wartosc2)
> plot(density(b$Wartosc)) - basic density plot
> plot(boxplot(b$Wartosc)) - basic boxplot
Play with k-means clusters
clusters <- kmeans(b[-1],2)
Get data with cluster no. column
> b <- cbind (b, clusters$cluster)
Plot x y with clusters distinguished by color:
> plot (b$Wartosc, b$Wartosc2, col=b[,3])
To add labels I tried:
> text(b$Wartosc, b$Wartosc2, labels=b$Nazwisko, cex= 0.7, pos=3)
Then I played with http://stats.stackexchange.com/questions/109273/creating-a-cluster-analysis-on-multiple-variables
> numbers_only <- b[c(-1,-4)]
> rownames(numbers_only) <- b$Nazwisko
> d <- dist(numbers_only, method="euclidean")
> fit <- hclust(d, method = "ward.D")
> plot(fit)
What if the tree is too large, and I want to generalize clusters. I need to cut a tree? http://stackoverflow.com/questions/6518133/clustering-list-for-hclust-function
> cutree(fit, h=10)
Kowalski Nowak Sikorowski
1 1 2
Shit, it is 01:00 AM. Good night
Brak komentarzy:
Prześlij komentarz