Jacek Kotowski's toolbox.: R: Braindump for this week: classification, ETL, etc.

sobota, 9 lipca 2016

R: Braindump for this week: classification, ETL, etc.

1. I am pondering about the use of clustering to classify unknown data.

first prepare data (what to do with categorical, ordinal?, mutate - create new variables?)

for this i found some interesting tutorials:
http://www.r-bloggers.com/clustering-mixed-data-types-in-r-2/
http://www.sthda.com/english/wiki/partitioning-cluster-analysis-quick-start-guide-unsupervised-machine-learning
I also found an algorhitm that proved to be very precise:

then run through a clustering algorytm (k-means, local density, other?)

first I chose to play with various algorythms to see how they detect known clusters
eg. the three species of famous iris database, I played with "cclust", "cluster", and "densityclust" and discovered "factoextra"for quick graphs/diagnostics

finally throw out a decision tree (or other non-black box tool to show rules and dependencies).

2. Getting out tidy data: broom (in .r documents), pander (in .Rmd documents), memisc:mtable + pander to compare linear models (http://stackoverflow.com/questions/24342162/regression-tables-in-markdown-format-for-flexible-use-in-r-markdown-v2)

3. ETL concept:
https://cran.r-project.org/web/packages/dplyr/vignettes/databases.html
https://github.com/beanumber/etl
http://www.r-bloggers.com/r-and-sqlite-part-1/

Jacek Kotowski's toolbox.

sobota, 9 lipca 2016

R: Braindump for this week: classification, ETL, etc.

Brak komentarzy:

Prześlij komentarz