piątek, 17 lutego 2017

R: self organizing maps

Interesting topics, see

https://www.r-bloggers.com/self-organising-maps-for-customer-segmentation-using-r/
http://www.slideshare.net/shanelynn/2014-0117-dublin-r-selforganising-maps-for-customer-segmentation-shane-lynn


Other interesting reading
https://www.r-bloggers.com/r-an-integrated-statistical-programming-environment-and-gis/

https://www.r-bloggers.com/how-to-perform-pca-with-r/

czwartek, 16 lutego 2017

SQL playground, remove duplicates and count students that passed exam.


Patent na usuwanie duplikatów z tabeli “in place” tj bez nadpisywania jej, bez usuwania i wstawiania innej:
Do przetestowania:
Kolumny ID (nie powtarza się). imie, nazwisko, dane (może duplikować się).
DELETE *
FROM Tabela1
WHERE [id] NOT IN
  (SELECT Max(Tabela1.id) AS id
    FROM Tabela1
    GROUP BY Tabela1.imie, Tabela1.nazwisko, Tabela1.dane);

w linii GROUP BY można określić gdzie szukamy duplikatów, w jakich kolumnach
Select Max... można użyć First(), albo Min(), żeby określić, które zduplikowane rekordy zachować.
 


Są trzy tabele
1. Studenci (_indeks_, imie, nazwisko)
2. Kursy (_id_, tytul, godzin, punktow), gdzie punktow oznacza ile punktow za zaliczenie kursu się dostaje
3. Szkolenia( osoba, kurs, zaliczenie) gdzie osoba to numer indeksu studenta, zaliczenie to data zaliczenia, a kurs to numer kursu
i pierwsze zadanie z matury brzmi: Ilu studentów zaliczyło w pierwszym terminie? (do 30 czerwca 2016)<pre>
Select count([_indeks_Stud]) AS ilu_zdalo From
    (Select [Studenci$].[_indeks_Stud]
    From ([Kursy$] 

          Inner Join [Szkolenia$] On [Kursy$].[_id_kurs] = [Szkolenia$].kurs) 
          Inner Join [Studenci$]   On [Studenci$].[_indeks_Stud] = [Szkolenia$].osoba
     Where [Szkolenia$].zaliczenie <= #2016-06-30#
     Group By [Studenci$].[_indeks_Stud]
     Having Sum([Kursy$].punktow) >= 15)
<./pre>
Kod powstał w ide Flyspeed SQL Query na bazie stworzonej w zakładkach w Excelu, stąd znaki dolara.... kod więc powinien działać po wklejeniu do Microsoft Query.

Python in RStudio

Data used:
Titanic data: https://www.kaggle.com/c/titanic/data
and tutorial:  http://nbviewer.jupyter.org/github/savarin/pyconuk-introtutorial/blob/master/notebooks/Section%201-0%20-%20First%20Cut.ipynb

Flights data: http://ucl.ac.uk/~uctqiax/data/flights.csv

Software used:
Portable scientific winpython (with pandas scikit-learn):
https://sourceforge.net/projects/winpython/?source=typ_redirect
To work it needed windows updates (my OS is windows 7):
https://www.microsoft.com/en-us/download/confirmation.aspx?id=49093

To install packages from source it needed:
http://landinghub.visualstudio.com/visual-cpp-build-tools
I needed feather package so I dowloaded it and used python command: pip install
as taught here: https://github.com/winpython/winpython/wiki/Installing-Additional-Packages
and installed from source: https://github.com/wesm/feather/tree/master/python

To learn how to use other languages in RStudio: http://rmarkdown.rstudio.com/authoring_knitr_engines.html

I also wanted to try if some portable version of bash would work. No problem:
http://win-bash.sourceforge.net/



Code for my playground.
---
title: "R Notebook"
output: html_notebook
---

## Bash

```{bash, engine.path="C:\\Users\\jkotows2\\Desktop\\shell.w32-ix86\\bash.exe"}
cat flights1.csv flights2.csv flights3.csv > flights.csv
```

## Python

http://rmarkdown.rstudio.com/authoring_knitr_engines.html

```{python, engine.path="C:\\Users\\jkotows2\\Desktop\\WinPython\\python-3.6.0.amd64\\python.exe"}
import pandas
import feather

# Read flights data and select flights to O'Hare
flights = pandas.read_csv("C:\\Users\\jkotows2\\Desktop\\_flights\\flights.csv")
flights = flights[flights['dest'] == "ORD"]

# Select carrier and delay columns and drop rows with missing values
flights = flights[['carrier', 'dep_delay', 'arr_delay']]
flights = flights.dropna()
print (flights.head(10))

# Write to feather file for reading from R
feather.write_dataframe(flights, "C:\\Users\\jkotows2\\Desktop\\_flights\\flights.feather")
```

## Back to R

```{r}
library(feather)
library(ggplot2)

# Read from feather and plot
flights <- read_feather("C:\\Users\\jkotows2\\Desktop\\_flights\\flights.feather")
ggplot(flights, aes(carrier, arr_delay)) + geom_point() + geom_jitter()
```

środa, 15 lutego 2017

lpsolve - solver in R

To study:
http://flovv.github.io/From_descritpive_to_prescriptive/
https://icyrock.com/blog/2013/12/linear-programming-in-r-using-lpsolve/
http://lpsolve.r-forge.r-project.org/
http://horicky.blogspot.co.uk/2013/01/optimization-in-r.html
http://lpsolve.sourceforge.net/5.5/R.htm