With Spring day 1 I start regular intro statistics study:
My notes from day 1:
Standard Error of Mean (SD of a sample):
SD/ root(n) - the larger sample size, the error decreases,
Z-Score: (x-mean)/SD
https://en.wikipedia.org/wiki/Standard_score
T-Score:
Conversion Score = z * (NewSD)+NewMean
NewSD for T-Score = 10 and NewMean = 50
T-Score = Z-Score*10+50
More: Essentials of Testing and Assessment: A Practical Guide for Counselors ...Autorzy Edward S. Neukrug,R. Charles Fawcett p 133 (fragment available online via Google)
Coefficient of Variation
CV% = (SD/Mean)*100
poniedziałek, 20 marca 2017
ExcelVBA: From recorded macro to reusable function.
Problem: with each change in slicer, pivot chart formatting changes.
So I want all my lables vertical (and with each change the damn thing returns to horizontal). My recorded macro did not return anything sensible. Lecture on chart label properties brought me to this solution:
Sub Macro1()
'
' Wersja "Recorder"
'
Dim mySrs As Series
With ActiveSheet.ChartObjects("chValueofOffers").Chart
.SeriesCollection(1).DataLabels.Orientation = xlUpward
.SeriesCollection(2).DataLabels.Orientation = xlUpward
End With
Next I wanted to iterate through series collection so that I do not need to change macro for more series:
Sub Macro2()
'
' Wersja "Obiektowo pętlowa"
'
Set seriesCol = ActiveSheet.ChartObjects("chValueofOffers").Chart.SeriesCollection
For Each mySeries In seriesCol
mySeries.DataLabels.Orientation = xlUpward
Next
Set seriesCol = ActiveSheet.ChartObjects("chValueofOffersTotal").Chart.SeriesCollection
End Sub
Finally... what if I have several charts on the same sheet that need vertical labels. Why not making it a reusable function with chart name as attribute. Now... with each change in the slicer the labels in the two charts get corrected to vertical.
Private Function DataLabelsVertical(mySheet As String, chtName As String)
Dim mySeries As Series
Set seriesCol = Worksheets(mySheet).ChartObjects(chtName).Chart.SeriesCollection
For Each mySeries In seriesCol
mySeries.DataLabels.Orientation = xlUpward
Next
End Function
Private Sub Worksheet_Change(ByVal Target As Range)
DataLabelsVertical "Value_tenders", "chValueofOffers"
DataLabelsVertical "Value_tenders", "chValueofOffersTotal"
DataLabelsVertical "Status", "chStatusValue"
DataLabelsVertical "Status", "chStatusValueTotal"
End Sub
End Function
Sources:
http://www.java2s.com/Code/VBA-Excel-Access-Word/Excel/Loopthrougheachseriesinchartandaltermarkercolors.htm
http://stackoverflow.com/questions/21165581/vba-looping-through-all-series-within-all-charts
So I want all my lables vertical (and with each change the damn thing returns to horizontal). My recorded macro did not return anything sensible. Lecture on chart label properties brought me to this solution:
Sub Macro1()
'
' Wersja "Recorder"
'
Dim mySrs As Series
With ActiveSheet.ChartObjects("chValueofOffers").Chart
.SeriesCollection(1).DataLabels.Orientation = xlUpward
.SeriesCollection(2).DataLabels.Orientation = xlUpward
End With
Next I wanted to iterate through series collection so that I do not need to change macro for more series:
Sub Macro2()
'
' Wersja "Obiektowo pętlowa"
'
Set seriesCol = ActiveSheet.ChartObjects("chValueofOffers").Chart.SeriesCollection
For Each mySeries In seriesCol
mySeries.DataLabels.Orientation = xlUpward
Next
Set seriesCol = ActiveSheet.ChartObjects("chValueofOffersTotal").Chart.SeriesCollection
End Sub
Finally... what if I have several charts on the same sheet that need vertical labels. Why not making it a reusable function with chart name as attribute. Now... with each change in the slicer the labels in the two charts get corrected to vertical.
Private Function DataLabelsVertical(mySheet As String, chtName As String)
Dim mySeries As Series
Set seriesCol = Worksheets(mySheet).ChartObjects(chtName).Chart.SeriesCollection
For Each mySeries In seriesCol
mySeries.DataLabels.Orientation = xlUpward
Next
End Function
Private Sub Worksheet_Change(ByVal Target As Range)
DataLabelsVertical "Value_tenders", "chValueofOffers"
DataLabelsVertical "Value_tenders", "chValueofOffersTotal"
DataLabelsVertical "Status", "chStatusValue"
DataLabelsVertical "Status", "chStatusValueTotal"
End Sub
End Function
Sources:
http://www.java2s.com/Code/VBA-Excel-Access-Word/Excel/Loopthrougheachseriesinchartandaltermarkercolors.htm
http://stackoverflow.com/questions/21165581/vba-looping-through-all-series-within-all-charts
piątek, 17 lutego 2017
R: self organizing maps
Interesting topics, see
https://www.r-bloggers.com/self-organising-maps-for-customer-segmentation-using-r/
http://www.slideshare.net/shanelynn/2014-0117-dublin-r-selforganising-maps-for-customer-segmentation-shane-lynn
Other interesting reading
https://www.r-bloggers.com/r-an-integrated-statistical-programming-environment-and-gis/
https://www.r-bloggers.com/how-to-perform-pca-with-r/
https://www.r-bloggers.com/self-organising-maps-for-customer-segmentation-using-r/
http://www.slideshare.net/shanelynn/2014-0117-dublin-r-selforganising-maps-for-customer-segmentation-shane-lynn
Other interesting reading
https://www.r-bloggers.com/r-an-integrated-statistical-programming-environment-and-gis/
https://www.r-bloggers.com/how-to-perform-pca-with-r/
czwartek, 16 lutego 2017
SQL playground, remove duplicates and count students that passed exam.
Patent na usuwanie duplikatów z tabeli “in
place” tj bez nadpisywania jej, bez usuwania i wstawiania innej:
Do przetestowania:
Kolumny ID (nie powtarza się). imie,
nazwisko, dane (może duplikować się).
DELETE
*
FROM Tabela1
WHERE [id] NOT IN
(SELECT Max(Tabela1.id) AS id
FROM Tabela1
GROUP BY Tabela1.imie, Tabela1.nazwisko, Tabela1.dane);
FROM Tabela1
WHERE [id] NOT IN
(SELECT Max(Tabela1.id) AS id
FROM Tabela1
GROUP BY Tabela1.imie, Tabela1.nazwisko, Tabela1.dane);
w linii GROUP BY można określić gdzie
szukamy duplikatów, w jakich kolumnach
Select Max... można użyć First(), albo Min(), żeby określić, które zduplikowane rekordy zachować.
Select Max... można użyć First(), albo Min(), żeby określić, które zduplikowane rekordy zachować.
Są trzy tabele
1. Studenci (_indeks_, imie, nazwisko)
2. Kursy (_id_, tytul, godzin, punktow), gdzie punktow oznacza ile punktow za zaliczenie kursu się dostaje
3. Szkolenia( osoba, kurs, zaliczenie) gdzie osoba to numer indeksu studenta, zaliczenie to data zaliczenia, a kurs to numer kursu
i pierwsze zadanie z matury brzmi: Ilu studentów zaliczyło w pierwszym terminie? (do 30 czerwca 2016)<pre>
Select count([_indeks_Stud]) AS ilu_zdalo From
(Select [Studenci$].[_indeks_Stud]
From ([Kursy$]
Inner Join [Szkolenia$] On [Kursy$].[_id_kurs] = [Szkolenia$].kurs)
Inner Join [Studenci$] On [Studenci$].[_indeks_Stud] = [Szkolenia$].osoba
Where [Szkolenia$].zaliczenie <= #2016-06-30#
Group By [Studenci$].[_indeks_Stud]
Having Sum([Kursy$].punktow) >= 15)
<./pre>
Kod powstał w ide Flyspeed SQL Query na bazie stworzonej w zakładkach w Excelu, stąd znaki dolara.... kod więc powinien działać po wklejeniu do Microsoft Query.
Python in RStudio
Data used:
Titanic data: https://www.kaggle.com/c/titanic/data
and tutorial: http://nbviewer.jupyter.org/github/savarin/pyconuk-introtutorial/blob/master/notebooks/Section%201-0%20-%20First%20Cut.ipynb
Flights data: http://ucl.ac.uk/~uctqiax/data/flights.csv
Software used:
Portable scientific winpython (with pandas scikit-learn):
https://sourceforge.net/projects/winpython/?source=typ_redirect
To work it needed windows updates (my OS is windows 7):
https://www.microsoft.com/en-us/download/confirmation.aspx?id=49093
To install packages from source it needed:
http://landinghub.visualstudio.com/visual-cpp-build-tools
I needed feather package so I dowloaded it and used python command: pip install
as taught here: https://github.com/winpython/winpython/wiki/Installing-Additional-Packages
and installed from source: https://github.com/wesm/feather/tree/master/python
To learn how to use other languages in RStudio: http://rmarkdown.rstudio.com/authoring_knitr_engines.html
I also wanted to try if some portable version of bash would work. No problem:
http://win-bash.sourceforge.net/
Code for my playground.
Titanic data: https://www.kaggle.com/c/titanic/data
and tutorial: http://nbviewer.jupyter.org/github/savarin/pyconuk-introtutorial/blob/master/notebooks/Section%201-0%20-%20First%20Cut.ipynb
Flights data: http://ucl.ac.uk/~uctqiax/data/flights.csv
Software used:
Portable scientific winpython (with pandas scikit-learn):
https://sourceforge.net/projects/winpython/?source=typ_redirect
To work it needed windows updates (my OS is windows 7):
https://www.microsoft.com/en-us/download/confirmation.aspx?id=49093
To install packages from source it needed:
http://landinghub.visualstudio.com/visual-cpp-build-tools
I needed feather package so I dowloaded it and used python command: pip install
as taught here: https://github.com/winpython/winpython/wiki/Installing-Additional-Packages
and installed from source: https://github.com/wesm/feather/tree/master/python
To learn how to use other languages in RStudio: http://rmarkdown.rstudio.com/authoring_knitr_engines.html
I also wanted to try if some portable version of bash would work. No problem:
http://win-bash.sourceforge.net/
Code for my playground.
--- title: "R Notebook" output: html_notebook --- ## Bash ```{bash, engine.path="C:\\Users\\jkotows2\\Desktop\\shell.w32-ix86\\bash.exe"} cat flights1.csv flights2.csv flights3.csv > flights.csv ``` ## Python http://rmarkdown.rstudio.com/authoring_knitr_engines.html ```{python, engine.path="C:\\Users\\jkotows2\\Desktop\\WinPython\\python-3.6.0.amd64\\python.exe"} import pandas import feather # Read flights data and select flights to O'Hare flights = pandas.read_csv("C:\\Users\\jkotows2\\Desktop\\_flights\\flights.csv") flights = flights[flights['dest'] == "ORD"] # Select carrier and delay columns and drop rows with missing values flights = flights[['carrier', 'dep_delay', 'arr_delay']] flights = flights.dropna() print (flights.head(10)) # Write to feather file for reading from R feather.write_dataframe(flights, "C:\\Users\\jkotows2\\Desktop\\_flights\\flights.feather") ``` ## Back to R ```{r} library(feather) library(ggplot2) # Read from feather and plot flights <- read_feather("C:\\Users\\jkotows2\\Desktop\\_flights\\flights.feather") ggplot(flights, aes(carrier, arr_delay)) + geom_point() + geom_jitter() ```
środa, 15 lutego 2017
lpsolve - solver in R
To study:
http://flovv.github.io/From_descritpive_to_prescriptive/
https://icyrock.com/blog/2013/12/linear-programming-in-r-using-lpsolve/
http://lpsolve.r-forge.r-project.org/
http://horicky.blogspot.co.uk/2013/01/optimization-in-r.html
http://lpsolve.sourceforge.net/5.5/R.htm
http://flovv.github.io/From_descritpive_to_prescriptive/
https://icyrock.com/blog/2013/12/linear-programming-in-r-using-lpsolve/
http://lpsolve.r-forge.r-project.org/
http://horicky.blogspot.co.uk/2013/01/optimization-in-r.html
http://lpsolve.sourceforge.net/5.5/R.htm
czwartek, 19 stycznia 2017
R packages for the lazy.
Data entry
Datapasta: Copy data from Excel or HTML to an R file - with keyboard shortcuts. And do it in a nice readable txt "tribble" format. https://github.com/MilesMcBain/datapasta
Quick overview
Quickly plot data having no time for a nice ggplot2 code. https://github.com/stefan-schroedl/plotluck
Quick ensembleR predict data: ensembleR package
Datapasta: Copy data from Excel or HTML to an R file - with keyboard shortcuts. And do it in a nice readable txt "tribble" format. https://github.com/MilesMcBain/datapasta
Quick overview
Quickly plot data having no time for a nice ggplot2 code. https://github.com/stefan-schroedl/plotluck
Quick ensembleR predict data: ensembleR package
Subskrybuj:
Posty (Atom)