piątek, 26 lutego 2016

R: Scavenging Github: imDev and TeachingDemos by Dmitry Grapov

Interesting Excel interface to R via RExcel: https://sourceforge.net/projects/imdev
Some interesting tutorials to study: https://github.com/dgrapov/TeachingDemos
https://sourceforge.net/projects/teachingdemos/files/
The author responded next day. I need to follow up!

środa, 24 lutego 2016

Excel VBA: Evaluate() or square brackets shorthand.

I recommend this trick to shorten the code for entering/copying/calculating data in spreadsheet ranges:
http://www.ozgrid.com/forum/showthread.php?t=52372&page=2

Example:
Sub Test()
    [A1].Offset(0, 1) = [=SUM(Table1[A])]
End Sub

sobota, 13 lutego 2016

R appeal to Ministry of Digital Affairs of Poland

Dear Minister Anna Streżyńska


I would like humbly to propose that Polish Ministry of Digital Affairs should support  the development of R community in Poland with a dedicated government program.


R community is a dynamically growing number of users of free R statistical environment which facilitates (indeed makes it easy for everyone) programming, data cleaning, exploration and data mining. In Warsaw alone it includes growing number of  enthusiasts, Warsaw Open University alumni of R courses, regular meetups (http://www.spotkania-entuzjastow-r.pl/), Smarter Poland activities and many others.


R, can provide a single environment for:

-  online access to public statistical data (medical, social, crime, economic, weather),

- a remote interactive teaching data science and programming in schools - teaching the IT skills that will be of immediate use to anyone  from business startup-per to hobbyist building a drone capable of  machine learning),

- visualizing and representing analysis results related to Poland (choropleths or subplots imposed on the map of Poland)

- the environment can also be set up for quick generating very high quality and publication ready reports that can be immediately printed and published on-line.


R can facilitate government transparency and promoting Open Government programme. (http://radar.oreilly.com/2009/07/making-government-transparent.html, http://ropengov.github.io/)


Some ideas of what Ministry of Digital Affairs can coordinate and assist with is:

- development of a dedicated R package for accessing public data, analysing and visualising it on an administrative map of Poland,

-development of an R distance learning package teaching statistics to Polish students (it can be the the matter of preparing a localized quality content, the tool "swirl" is available for that purpose),

- encouragement of public offices to publish current data that can be easily analyzed in R (csv, . This can be current information from treasury, statistical offices, police (geography of crime incidents), weather, health diagnosis and procedures (NHF and Ministry of Health data) - information devoid of personal information but fresh and detailed enough for Polish citizen to make informed decisions.


Advantages for Polish administration and economy:

Free R can be compared to expensive SPSS, Statistica and SAS packages - it can save millions giving powerful mining capabilities to government at any level and all citizens. It has been popular with world best universities, now it is used by big media houses, banks and research labs.

R data mining can be used for evectively analysing any data and drawing right conclusions in  health management, poverty, tax evasion, crime incidence, even counterintelligence and anti-terrorism. R literacy can generate high quality jobs and bring measurable profit because it is a tool facilitating innovation and right decisions based on science.

It gives Poland a unique chance to be among the first to embrace the ideas of true modern democracy expressed in access to information and true modern freedom of making informed choices.



Sincerely Yours,


Jacek Kotowski

czwartek, 11 lutego 2016

Coffee break: Lesson ideas: R and Titanic disaster - trees and forests.

Great ideas for a beginner like me to play with data mining.
Data and challenge proposed by  kaggle.com:
https://www.kaggle.com/c/titanic/dat

Good tutorials and proposed solutins here:
http://trevorstephens.com/post/72916401642/titanic-getting-started-with-r and

https://github.com/trevorstephens/titanic

Also: https://www.kaggle.com/amoyakd/titanic/randomforest-method-v1-0

https://www.kaggle.com/c/titanic/forums

Some reading on R capabilities: 
http://www.edureka.co/blog/implementation-of-decision-tree/
http://www.r-bloggers.com/a-brief-tour-of-the-trees-and-forests/


Other tutorials:
Iris data: http://rischanlab.github.io/RandomForest.html
(R Basics worth exploring: http://rischanlab.github.io/)

http://dni-institute.in/blogs/random-forest-using-r-step-by-step-tutorial/ (might be even better for a start

http://www.edureka.co/blog/implementation-of-decision-tree/

Biostars Tutorial: Machine Learning For Cancer Classification - Part 1 - Preparing The Data Sets
https://www.biostars.org/p/85124/

http://www.tutorialspoint.com/r/r_random_forest.htm

http://www.analyticsvidhya.com/blog/2015/09/random-forest-algorithm-multiple-challenges/

Other important topics to explore:
http://www.analyticsvidhya.com/blog/2015/12/faster-data-manipulation-7-packages/





wtorek, 9 lutego 2016

Rafał Wójcikowski -constructive critique of gov pronatalist policy in Polish Parliament

Coffee break, listened briefly to Polish Parliament debate on pronatalist policy 500PLN/month for each child.
Very good alternative proposed by Rafał Wójcikowski: https://pl.wikipedia.org/wiki/Rafa%C5%82_W%C3%B3jcikowski Based on Milton Friedman's negative income tax https://en.wikipedia.org/wiki/Negative_income_tax Implemented in Israel. Need to read more on this. Will PiS gov have courage and guts to consider it? I wonder what min. Morawiecki thinks. Point for Kukiz15 party.

niedziela, 7 lutego 2016

Small plots on a plot: R

Pie charts instead of points or bubbles in x-y/scatter-plot/bubble plot: Uncle google found this for me:

http://blog.revolutionanalytics.com/2012/09/visualize-complex-data-with-subplots.html
http://www.r-bloggers.com/embeding-a-subplot-in-ggplot-via-subview/
http://vita.had.co.nz/papers/embedded-plots.pdf

ggsubplot

http://stackoverflow.com/questions/26577669/tiny-pie-charts-to-represent-each-point-in-an-scatterplot-using-ggplot2
http://stackoverflow.com/questions/7714677/r-scatterplot-with-too-many-points/16122003#16122003



pieGlyph
http://stackoverflow.com/questions/20465070/barplots-on-a-map/20468836#20468836

Maybe it will be possible in ggvis??
http://ggvis.rstudio.com/ggvis-basics.html


===========
Here a solution without ggplot (p.11):
https://www.r-project.org/doc/Rnews/Rnews_2003-2.pdf

====
plotrix and floating.pie?
http://ask.programmershare.com/2269_10385068/

====
http://gis.stackexchange.com/questions/4568/effectively-displaying-demographic-data-on-a-printed-map

====
ggtree
http://stackoverflow.com/questions/10368180/plotting-pie-graphs-on-map-in-ggplot

mapplots
http://www.molecularecologist.com/2012/09/making-maps-with-r/


Rozwiazanie w Excelu:
http://www.andypope.info/charts/piedatamarkers.htm

piątek, 5 lutego 2016

R - Machine learning cheatsheet

A syntax for applying some machine learning procedures in R. Misleadingly.... simple :-) http://vitalflux.com/cheat-sheet-10-machine-learning-algorithms-r-commands/

środa, 3 lutego 2016

Excel: fill in table from another table.

UPDATE: I am now working on replacing offsets with absolute or structured references, ie. references to Listobject Table. This will make my vba immune from inserting columns and allow hiding the source table:

Private Sub FindCopyValues()
     
     

Dim strKodRange As Range
Dim lObjSource As ListObject: Set lObjSource = Sheets("Pricelist").ListObjects("tblPricelist")
Dim lObjTarget As ListObject: Set lObjTarget = Sheets("Quote").ListObjects("tblQuote")

Dim iTableRowNum As Integer

'set cursor in first cell of column Code in tblPricelist
Dim curAddr As Range: Set curAddr = lObjTarget.DataBodyRange.Cells(1, lObjTarget.ListColumns("Code").Index)

'set what to look for
Dim strKod As String: strKod = curAddr.Value



'what offsets between Code and Price

With lObjTarget
    Dim Code2Price As Integer: Code2Price = .ListColumns("Price").Index - .ListColumns("Code").Index
    Dim Code2Description As Integer: Code2Description = .ListColumns("Description").Index - .ListColumns("Code").Index


End With



      
   Do Until strKod = ""
        
            Set Cell = lObjSource.ListColumns("Code").DataBodyRange.Cells.Find(What:=strKod, MatchCase:=False, LookAt:= _
            xlWhole, SearchFormat:=False)
            
            iTableRowNum = Cell.Row - lObjSource.DataBodyRange.Rows(1).Row + 1
            Debug.Print iTableRowNum
    
'            With Cell
'
'                   curAddr.Offset(0, Code2Price).Value = .Offset(0, 1).Value
'                   curAddr.Offset(0, Code2Description).Value = .Offset(0, 2).Value
'
'            End With
               
            'Albo
                   curAddr.Offset(0, Code2Price).Value = lObjSource.DataBodyRange.Cells(iTableRowNum, lObjSource.ListColumns("Price").Index)
                   curAddr.Offset(0, Code2Description).Value = lObjSource.DataBodyRange.Cells(iTableRowNum, lObjSource.ListColumns("Description").Index)
               
               
               
            Set curAddr = curAddr.Offset(1, 0)
            strKod = curAddr.Text
       
   Loop
       
Application.ScreenUpdating = True
Application.EnableEvents = True
   
   
End Sub

Problem z vlookupem, nie zwraca wartości wraz z linkiem do strony internetowej, zdjęcia czy dokumentu.
Rozwiązanie, np. użycie VBA


Private Sub Worksheet_Change(ByVal Target As Range)
 
Dim strKod As String
 
strKod = Sheets("Baza").Range("G4").Text
  
If Target.Address = "$G$4" Then
 
     Range("$G$4").Select
      
    With Sheets("Aku").Cells.Find(What:=strKod, after:=ActiveCell, LookIn:=xlValues, LookAt:= _
            xlWhole, SearchOrder:=xlByColumns, SearchDirection:=xlNext, MatchCase:= _
            False)
             
            .Offset(0, 1).Copy Destination:=Sheets("Baza").Range("H4")
      
            .Offset(0, 2).Copy Destination:=Sheets("Baza").Range("I4")
    End With
   
    End If
 
End Sub
Powtarza się kod .Offset... więc można go połączyć w jedną linię za pomocą Union. Ponadto, nie chcę szukać w całym arkuszu a jedynie w kolumnie tabeli.
Private Sub Worksheet_Change(ByVal Target As Range)

Dim strKod As String

strKod = Sheets("Baza").Range("G4").Text
oListObj = Sheets("Aku").ListObjects("Tabela1").ListColumns("Nazwa aku.")
 
If Target.Address = "$G$4" Then

     Range("$G$4").Select
     
    With oListObj.DataBodyRange.Cells.Find(What:=strKod, after:=ActiveCell, LookIn:=xlValues, LookAt:= _
            xlWhole, SearchOrder:=xlByColumns, SearchDirection:=xlNext, MatchCase:= _
            False)
            
            Union(.Offset(0, 1), .Offset(0, 2)).Copy Destination:=Sheets("Baza").Range("H4")
     

    End With

    End If

End Sub

A co jeśli chcemy wyszukać nie tylko w komórce G4 ale w całej kolumnie?
Private Sub FindCopyValues()
   
Dim strKod As String
Dim strKodRange As Range
Dim objList As ListObject
Dim curAddr As Object
 
'Application.ScreenUpdating = False
'Application.EnableEvents = False
 
 
Set objList = Sheets("Aku").ListObjects("Tabela1")
Set curAddr = Worksheets("Baza").Range("$G$4")
   
 
    strKod = curAddr.Value
    
   Do Until strKod = ""
      
            Set cell = objList.ListColumns("Nazwa aku.").DataBodyRange.Cells.Find(What:=strKod, MatchCase:=False, LookAt:= _
            xlWhole, SearchFormat:=False)
  
            With cell
                       
                    Union(.Offset(0, 1), .Offset(0, 2)).Copy Destination:=curAddr.Offset(0, 1)
         
            End With
             
   Set curAddr = curAddr.Offset(1, 0)
   strKod = curAddr.Text
     
   Loop
     
Application.ScreenUpdating = True
Application.EnableEvents = True
 
 
End Sub

Inny temat: chcemy skopiować tylko widoczne wiersze do innej tabeli (niewidoczne nas nie interesują).
 

Sub CopyVisibleProducts()

Sheets("cennik").ListObjects("Table1").DataBodyRange.SpecialCells(xlCellTypeVisible).Copy Sheets("pro_forma").Range("b5")
Application.CutCopyMode = False

End Sub






poniedziałek, 1 lutego 2016

Coffee break: imputation

From CranR:
Simple mean imputation: http://www.r-bloggers.com/example-2014-5-simple-mean-imputation/

knnImputation {DMwR} Function that fills in all NA values using the k Nearest Neighbours of each case with NA values.
centralImputation {DMwR} This function fills in any NA value in all columns of a data frame with the statistic of centrality (given by the function centralvalue()) of the respective column.

Other: kNNImpute {imputation} Imputation using k-nearest neighbors. For each record, identify missinng features. http://www.bioconductor.org/packages/release/bioc/html/impute.html
 impute {Hmisc} http://www.inside-r.org/packages/cran/hmisc/docs/impute http://www.jstatsoft.org/article/view/v023i10 http://docs.zeligproject.org/en/latest/using-Zelig-with-Amelia.html