sobota, 9 lipca 2016

R: Braindump for this week: classification, ETL, etc.

1. I am pondering about the use of clustering to classify unknown data.
  • first prepare data (what to do with categorical, ordinal?, mutate - create new variables?)
    • for this i found some interesting tutorials:
    • I also found an algorhitm that proved to be very precise:
  • then run through a clustering algorytm (k-means, local density, other?)
    • first I chose to play with various algorythms to see how they detect known clusters
      eg. the three species of famous iris database, I played with "cclust", "cluster", and "densityclust" and discovered "factoextra"for quick graphs/diagnostics
  • finally throw out a decision tree (or other non-black box tool to show rules and dependencies).
2. Getting out tidy data: broom (in .r documents), pander (in .Rmd documents), memisc:mtable + pander to compare linear models (

3. ETL concept:

Brak komentarzy:

Prześlij komentarz