poniedziałek, 20 marca 2017

Stats Day1

With Spring day 1 I start regular intro statistics study:

My notes from day 1:
Standard Error of Mean (SD of a sample):
SD/ root(n)  - the larger sample size, the error decreases,


Z-Score: (x-mean)/SD
https://en.wikipedia.org/wiki/Standard_score

T-Score:
Conversion Score = z * (NewSD)+NewMean
NewSD for T-Score = 10 and NewMean = 50
T-Score = Z-Score*10+50
More: Essentials of Testing and Assessment: A Practical Guide for Counselors ...Autorzy Edward S. Neukrug,R. Charles Fawcett p 133 (fragment available online via Google)


Coefficient of Variation
CV% = (SD/Mean)*100

ExcelVBA: From recorded macro to reusable function.

Problem: with each change in slicer, pivot chart formatting changes.
 
So I want all my lables vertical (and with each change the damn thing returns to horizontal). My recorded macro did not return anything sensible. Lecture on chart label properties brought me to this solution:

Sub Macro1()
'
' Wersja "Recorder"
'
Dim mySrs As Series

With ActiveSheet.ChartObjects("chValueofOffers").Chart
        

    .SeriesCollection(1).DataLabels.Orientation = xlUpward
    .SeriesCollection(2).DataLabels.Orientation = xlUpward


End With



Next I wanted to iterate through series collection so that I do not need to change macro for more series:

Sub Macro2()
'
' Wersja "Obiektowo pętlowa"

'

Set seriesCol = ActiveSheet.ChartObjects("chValueofOffers").Chart.SeriesCollection
        
For Each mySeries In seriesCol
    mySeries.DataLabels.Orientation = xlUpward
Next

Set seriesCol = ActiveSheet.ChartObjects("chValueofOffersTotal").Chart.SeriesCollection


   
End Sub

 Finally... what if I have several charts on the same sheet that need vertical labels. Why not making it a reusable function with chart name as attribute. Now... with each change in the slicer the labels in the two charts get corrected to vertical.


Private Function DataLabelsVertical(mySheet As String, chtName As String)

 Dim mySeries As Series
 Set seriesCol = Worksheets(mySheet).ChartObjects(chtName).Chart.SeriesCollection

    For Each mySeries In seriesCol
       mySeries.DataLabels.Orientation = xlUpward
    Next

End Function

Private Sub Worksheet_Change(ByVal Target As Range)
   DataLabelsVertical "Value_tenders", "chValueofOffers"
   DataLabelsVertical "Value_tenders", "chValueofOffersTotal"
   DataLabelsVertical "Status", "chStatusValue"
   DataLabelsVertical "Status", "chStatusValueTotal"
End Sub


End Function


 Sources:
http://www.java2s.com/Code/VBA-Excel-Access-Word/Excel/Loopthrougheachseriesinchartandaltermarkercolors.htm
http://stackoverflow.com/questions/21165581/vba-looping-through-all-series-within-all-charts



piątek, 17 lutego 2017

R: self organizing maps

Interesting topics, see

https://www.r-bloggers.com/self-organising-maps-for-customer-segmentation-using-r/
http://www.slideshare.net/shanelynn/2014-0117-dublin-r-selforganising-maps-for-customer-segmentation-shane-lynn


Other interesting reading
https://www.r-bloggers.com/r-an-integrated-statistical-programming-environment-and-gis/

https://www.r-bloggers.com/how-to-perform-pca-with-r/

czwartek, 16 lutego 2017

SQL playground, remove duplicates and count students that passed exam.


Patent na usuwanie duplikatów z tabeli “in place” tj bez nadpisywania jej, bez usuwania i wstawiania innej:
Do przetestowania:
Kolumny ID (nie powtarza się). imie, nazwisko, dane (może duplikować się).
DELETE *
FROM Tabela1
WHERE [id] NOT IN
  (SELECT Max(Tabela1.id) AS id
    FROM Tabela1
    GROUP BY Tabela1.imie, Tabela1.nazwisko, Tabela1.dane);

w linii GROUP BY można określić gdzie szukamy duplikatów, w jakich kolumnach
Select Max... można użyć First(), albo Min(), żeby określić, które zduplikowane rekordy zachować.
 


Są trzy tabele
1. Studenci (_indeks_, imie, nazwisko)
2. Kursy (_id_, tytul, godzin, punktow), gdzie punktow oznacza ile punktow za zaliczenie kursu się dostaje
3. Szkolenia( osoba, kurs, zaliczenie) gdzie osoba to numer indeksu studenta, zaliczenie to data zaliczenia, a kurs to numer kursu
i pierwsze zadanie z matury brzmi: Ilu studentów zaliczyło w pierwszym terminie? (do 30 czerwca 2016)<pre>
Select count([_indeks_Stud]) AS ilu_zdalo From
    (Select [Studenci$].[_indeks_Stud]
    From ([Kursy$] 

          Inner Join [Szkolenia$] On [Kursy$].[_id_kurs] = [Szkolenia$].kurs) 
          Inner Join [Studenci$]   On [Studenci$].[_indeks_Stud] = [Szkolenia$].osoba
     Where [Szkolenia$].zaliczenie <= #2016-06-30#
     Group By [Studenci$].[_indeks_Stud]
     Having Sum([Kursy$].punktow) >= 15)
<./pre>
Kod powstał w ide Flyspeed SQL Query na bazie stworzonej w zakładkach w Excelu, stąd znaki dolara.... kod więc powinien działać po wklejeniu do Microsoft Query.

Python in RStudio

Data used:
Titanic data: https://www.kaggle.com/c/titanic/data
and tutorial:  http://nbviewer.jupyter.org/github/savarin/pyconuk-introtutorial/blob/master/notebooks/Section%201-0%20-%20First%20Cut.ipynb

Flights data: http://ucl.ac.uk/~uctqiax/data/flights.csv

Software used:
Portable scientific winpython (with pandas scikit-learn):
https://sourceforge.net/projects/winpython/?source=typ_redirect
To work it needed windows updates (my OS is windows 7):
https://www.microsoft.com/en-us/download/confirmation.aspx?id=49093

To install packages from source it needed:
http://landinghub.visualstudio.com/visual-cpp-build-tools
I needed feather package so I dowloaded it and used python command: pip install
as taught here: https://github.com/winpython/winpython/wiki/Installing-Additional-Packages
and installed from source: https://github.com/wesm/feather/tree/master/python

To learn how to use other languages in RStudio: http://rmarkdown.rstudio.com/authoring_knitr_engines.html

I also wanted to try if some portable version of bash would work. No problem:
http://win-bash.sourceforge.net/



Code for my playground.
---
title: "R Notebook"
output: html_notebook
---

## Bash

```{bash, engine.path="C:\\Users\\jkotows2\\Desktop\\shell.w32-ix86\\bash.exe"}
cat flights1.csv flights2.csv flights3.csv > flights.csv
```

## Python

http://rmarkdown.rstudio.com/authoring_knitr_engines.html

```{python, engine.path="C:\\Users\\jkotows2\\Desktop\\WinPython\\python-3.6.0.amd64\\python.exe"}
import pandas
import feather

# Read flights data and select flights to O'Hare
flights = pandas.read_csv("C:\\Users\\jkotows2\\Desktop\\_flights\\flights.csv")
flights = flights[flights['dest'] == "ORD"]

# Select carrier and delay columns and drop rows with missing values
flights = flights[['carrier', 'dep_delay', 'arr_delay']]
flights = flights.dropna()
print (flights.head(10))

# Write to feather file for reading from R
feather.write_dataframe(flights, "C:\\Users\\jkotows2\\Desktop\\_flights\\flights.feather")
```

## Back to R

```{r}
library(feather)
library(ggplot2)

# Read from feather and plot
flights <- read_feather("C:\\Users\\jkotows2\\Desktop\\_flights\\flights.feather")
ggplot(flights, aes(carrier, arr_delay)) + geom_point() + geom_jitter()
```

środa, 15 lutego 2017

lpsolve - solver in R

To study:
http://flovv.github.io/From_descritpive_to_prescriptive/
https://icyrock.com/blog/2013/12/linear-programming-in-r-using-lpsolve/
http://lpsolve.r-forge.r-project.org/
http://horicky.blogspot.co.uk/2013/01/optimization-in-r.html
http://lpsolve.sourceforge.net/5.5/R.htm

czwartek, 19 stycznia 2017

R packages for the lazy.

Data entry
Datapasta: Copy data from Excel or HTML to an R file - with keyboard shortcuts.  And do it in a nice readable  txt  "tribble" format. https://github.com/MilesMcBain/datapasta

Quick overview
Quickly plot data having no time for a nice ggplot2 code.  https://github.com/stefan-schroedl/plotluck

Quick ensembleR predict data: ensembleR package