- first prepare data (what to do with categorical, ordinal?, mutate - create new variables?)
- for this i found some interesting tutorials:
http://www.r-bloggers.com/clustering-mixed-data-types-in-r-2/
http://www.sthda.com/english/wiki/partitioning-cluster-analysis-quick-start-guide-unsupervised-machine-learning - I also found an algorhitm that proved to be very precise:
- then run through a clustering algorytm (k-means, local density, other?)
- first I chose to play with various algorythms to see how they detect known clusters
eg. the three species of famous iris database, I played with "cclust", "cluster", and "densityclust" and discovered "factoextra"for quick graphs/diagnostics - finally throw out a decision tree (or other non-black box tool to show rules and dependencies).
3. ETL concept:
https://cran.r-project.org/web/packages/dplyr/vignettes/databases.html
https://github.com/beanumber/etl
http://www.r-bloggers.com/r-and-sqlite-part-1/
Brak komentarzy:
Prześlij komentarz