piątek, 29 kwietnia 2016

Syria: zmuszeni do ucieczki/ forced to flee | PAH |






If we are afraid refugees militants already in EU, why not bring in families direct from War zone. Casualties, those in need. Be the first in EU to do it right.
If we are afraid of Sunnite radical Muslims, why not bring in Christians first who are in most danger from IS radicals. If we are afraid of radical Islam, Wahabites, why not forbid their practice altogether in Poland. Sorry we are not that tolerant that we should risk losing our heads.
If we are afraid of ghettos, why not evenly distribute refugees...
If we do not have people for security screening, why not involve Syrians living in Poland in our sec forces?

Solve problems. Inform the public. Talk. Act.


Refugees (for your coffee break). Scene from Baraka. Music by Dead Can Dance.

czwartek, 28 kwietnia 2016

Coffee break: Plotly in Excel or Powerpoint? H2O on local cloud?

Can interactive R/ggplot2 plots be used in PowerPoint via plotly? To be tested at home with this Office2013 plugin: https://store.office.com/plotly-d3-js-charts-for-powerpoint-and-excel-WA104379485.aspx?assetid=WA104379485

I need some H2O to survive. (multicomputer data mining at home?, try deep learning?)
http://docs.h2o.ai/h2oclassic/deployment/multinode.html
http://www.kdnuggets.com/2015/01/interview-arno-candel-h20-deep-learning.html
https://tagteam.harvard.edu/hub_feeds/1981/feed_items/1369576


R in robotics? http://www.bnosac.be/index.php/blog/39-using-r-in-robotics-applications-with-ros http://wiki.ros.org/Robots

poniedziałek, 25 kwietnia 2016

Coffee Break:Ensembles

Ensembles:
http://machinelearningmastery.com/machine-learning-ensembles-with-r/
http://amunategui.github.io/blending-models/
http://blog.revolutionanalytics.com/2014/04/ensemble-packages-in-r.html
http://www.overkillanalytics.net/more-is-always-better-the-power-of-simple-ensembles/
http://www.r-bloggers.com/caretensemble-classification-example/

http://www.vikparuchuri.com/blog/intro-to-ensemble-learning-in-r/

https://inclass.kaggle.com/c/15-071x-the-analytics-edge-summer-2015/forums/t/15386/has-anyone-tried-model-stacking/86164#post86164

https://github.com/ujjwalkarn/Machine-Learning-Tutorials/blob/master/README.md

For beginners:
http://www.r-bloggers.com/user2013-the-caret-tutorial/
http://www.sharpsightlabs.com/quick-introduction-machine-learning-r-caret/
 http://www.sharpsightlabs.com/dplyr-intro-data-manipulation-with-r/
http://www.sharpsightlabs.com/data-analysis-example-r-supercars-part2/

http://www.computerworld.com/article/2486425/business-intelligence/business-intelligence-4-data-wrangling-tasks-in-r-for-advanced-beginners.html
http://www.computerworld.com/article/2884322/app-development/learn-r-programming-basics-with-our-pdf.html#tk.ctw-eos

Forecasting:
https://www.otexts.org/fpp/using-r

Geospatial
http://www.maths.lancs.ac.uk/~rowlings/Teaching/UseR2012/cheatsheet.html
https://github.com/Robinlovelace/Creating-maps-in-R

GGplot snippets:
http://www.computerworld.com/article/2936729/data-analytics/free-download-save-r-data-visualization-time-with-these-ggplot2-code-snippets.html

Best packages/commands:
http://www.computerworld.com/article/2921176/business-intelligence/great-r-packages-for-data-import-wrangling-visualization.html
http://www.personality-project.org/r/r.commands.html

2 minutes tutorials:
http://www.twotorials.com/
http://courses.had.co.nz/

piątek, 22 kwietnia 2016

R-Meetups. New superpower in the making.

In the world: http://blog.revolutionanalytics.com/2015/06/r-user-groups-are-everywhere.html
In Poland, I attended one yesterday in Warsaw. Eye opener:  deep learning (meeting with winners of Kaggle Competition), H2O and MS Visual Studio R addin. http://www.meetup.com/Spotkania-Entuzjastow-R-Warsaw-R-Users-Group-Meetup/events/228608144/

środa, 20 kwietnia 2016

My R coffee break...R in Visual Studio, updateR, free R stuff.

Microsoft mimics RStudio in Visual Studio?
https://www.visualstudio.com/en-us/features/rtvs-vs.aspx


Easy update R to new version with installr: may help automate lab IT work:
http://www.r-statistics.com/2013/03/updating-r-from-r-on-windows-using-the-installr-package/

Interesting resource: Online Open Access Textbooks, here on forecasting: https://www.otexts.org/fpp/data

Wow: Hadley Wickham: R for data science book in the making. http://r4ds.had.co.nz/

poniedziałek, 18 kwietnia 2016

SQL with VBS when data too big for Excel

I will use when data is too big for Excel, to quickly filter, aggregate or leftjoin data. Excel is not needed. It may be a poor-man, MSWindows (>=7.0v, it may work with older v. and an older Access runtime) solution for data too big for Excel.

1. Install Microsoft Access Runtime, the newest is 2013 https://www.microsoft.com/nl-nl/download/details.aspx?id=39358

2. Put in the same foldar: your initial data in a text file (csv) and an empty output file (csv) with one row reflecting all column names that should be in a result of a query.

3. Prepare a Schema.ini file: describe the two csv files in the following format (an example):

[PhoneList.csv]
ColNameHeader=True
Format=Delimited(;)
DateTimeFormat=yyyy-mm-dd
MaxScanRows=25
CharacterSet=ANSI
Col1=Surname Char Width 10
Col2=Name Char Width 10
Col3=No Long Width 10


[exp.csv]
ColNameHeader=True
Format=Delimited(;)
Col1=Surname Char Width 10
Col2=Name Char Width 10
Col3=No Long Width 10

4. Prepare a vbs file: it should contain the following code (an example):

Dim db: db = CreateObject("Scripting.FileSystemObject").GetParentFolderName(WScript.ScriptFullName)

Dim cn: Set cn = CreateObject("ADODB.Connection")

cn.Open _
    "Provider=Microsoft.ACE.OLEDB.15.0;" & _
          "Data Source=" & db & ";" & _
          "Extended Properties=""text;HDR=YES;FMT=Delimited(;)"""


cn.Execute "INSERT INTO [exp.csv] SELECT [Surname],[Name],[No] FROM [PhoneList.csv] WHERE [No] > -20"

cn.Close

5. The environment should be 32 or 64 bit, not a mix. Unfortunately I have 64 bit system i and office 32 bit therefore I am using a workaround tostart a script in 32 bit mode:

c:\windows\syswow64\cscript.exe moj_skrypt.vbs

Result: I tried the script on a 1 mln records csv file and it took 2 s to return the filtered data in the output csv.

Note to myself and others: Here you will find an alternative solution with R and the test data 2,6 mln records - R and ff package. For advanced jobs there is a superb idea of using ff with dplyr (ffbase2). I need to test it. http://www.r-bloggers.com/if-you-are-into-large-data-and-work-a-lot-with-package-ff/

BIQdata - a public data exploration portal by Gazeta Wyborcza.

I am not a subscriber, therefore I cannot see how well this idea is accomplished. I think that not providing a trial period or some free content is a mistake. But I hope the service will motivate all other information providers to copy the idea and  build data exploration sections or have at least one data exploration guy in the newsroom.
Data exploration tool can be dangerous in the hands of an investigative journalist. It can show in an objective way, which public tenders are suspicious or where government action against crime and low quality of life is insufficient.

If your budgets are tight, invest in an R guy and use markdown. I will be happy to see journalists using data mining tools to explore public databases and present (and validate) results of research in an informative way.

piątek, 8 kwietnia 2016

Open Intro: free introductory statistics handbook, videos plus R lab files.

https://www.openintro.org/ - my recommendation for today.

caret::preprocess, how to see the results.

Being a notorious beginner I wondered how to see the results of preprocessing, (normal-izing, scaling, centering) my data with caret package in R. I typed some data in Excel and played with it in R Studio.I could not figure out at first where the results of preProcess function are. It appears they must be processed with "predict".

library(caret)
x<-read.table(file="clipboard",dec=",",header=T)
x
#preProcValues<-scale(x,center = T, scale=T) - just a note on a built in function in base R

#here i build a "model", a preprocessing object or model

preProcObject<-preProcess(x, method = c("center", "scale","YeoJohnson"))

#to see the results I must apply it
Data_PreProc <- predict(preProcObject, x)



#I build a table to see source data and processed data side by side.
Output<-cbind("input"=x$data,"output"=Data_PreProc$data)

Output
plot(Output)