piątek, 29 stycznia 2016
What u play with on smartphone? I play R stats :-)
1. Installed GnuRoot app and GnuRoot Wheezy (no real rooting required, it creates its own world!)
2. Refreshed software: apt-get update & apt-get upgrade.
3. Installed R: apt-get install R-base-core. Lots of space is needed, so delete all trash apps from your android phone!
===========
I typed R, created a dataframe to play with, run kmeans on it, with no issues. There is no graphics yet, you will not see plots (probably you can generate them as files), unless you install another heavy MB of linux GUI like lxde. But for my bus trips I use txtplot - it produces density, boxplot, scatter, acf in ASCII.
To see kmeans clusters: txtplot(x,y,pch=cluster_no)
R used phone's internet and downloaded it with no issues - it means you can also download dataframes from a repository online.
A keyboard with arrows is recommended, like Hacker's Keyboard, so that it should be easy to go through previous commands, modify them and correct typo's inside the string..
GnuRoot autor Corbin Champion also offers a solution to run Octave on your phone. I highly recommend him. https://play.google.com/store/apps/details?id=champion.gnuroot
More: Google: install r in android:
http://www.r-ohjelmointi.org/?p=1434
http://stackoverflow.com/questions/13047994/are-there-any-android-version-of-r-without-rooting-the-device
czwartek, 28 stycznia 2016
Watched "Interstellar" with my son. Impressive
"Mantra" repeated frequently in the movie:
Do not go gentle into that good night Dylan Thomas, 1914 - 1953 Do not go gentle into that good night, Old age should burn and rave at close of day; Rage, rage against the dying of the light. Though wise men at their end know dark is right, Because their words had forked no lightning they Do not go gentle into that good night. Good men, the last wave by, crying how bright Their frail deeds might have danced in a green bay, Rage, rage against the dying of the light. Wild men who caught and sang the sun in flight, And learn, too late, they grieved it on its way, Do not go gentle into that good night. Grave men, near death, who see with blinding sight Blind eyes could blaze like meteors and be gay, Rage, rage against the dying of the light. -------------------------------- Nie wchodź łagodnie do tej dobrej nocy, Starość u kresu dnia niech płonie, krwawi; Buntuj się, buntuj, gdy światło się mroczy. Mędrcy, choć wiedzą, że ciemność w nich wkroczy – Bo nie rozszczepią słowami błyskawic – Nie wchodzą cicho do tej dobrej nocy. Cnotliwi, płacząc kiedy ich otoczy Wspomnienie czynów w kruchym wieńcu sławy, Niech się buntują, gdy światło się mroczy. Szaleni słońce chwytający w locie, Wasz śpiew radosny był mu trenem łzawym; Nie wchodźcie cicho do tej dobrej nocy. Posępnym, którym śmierć oślepia oczy, Niech wzrok się w blasku jak meteor pławi; Niech się buntują, gdy światło się mroczy. Błogosławieństwem i klątwą niech broczy Łza twoja, ojcze w niebie niełaskawym. Nie wchodź łagodnie do tej dobrej nocy. Buntuj się, buntuj, gdy światło się mroczy. (wiersz Dylana Thomasa w przekładzie Stanisława Barańczaka)
piątek, 22 stycznia 2016
Coffee break: VBA Class refresher
Where to use: http://stackoverflow.com/questions/118863/when-to-use-a-class-in-vba
Some Access vba reading: http://sourcedaddy.com/ms-access/object-oriented-programming-with-vba.html
http://vbadeveloper.net/optionpricerclassesdesignvba.pdf
very interesting: http://www.thelandbeyondspreadsheets.com/a-simple-example-of-how-to-use-class-modules-for-something-useful-in-excel-vba/
Microsoft article: https://www.microsoftpressstore.com/articles/article.aspx?p=2225071
Indeed Wise Owl: http://www.wiseowl.co.uk/blog/s237/classes.htm
czwartek, 21 stycznia 2016
VBA: cleaning data from ListobjectTable without destroying it.
http://stackoverflow.com/questions/10220906/how-to-select-clear-table-contents-without-destroying-the-table Version improved for my needs checks if data has not been cleaned already:
    With .ListObjects("Table3")
        .Range.AutoFilter
            If .DataBodyRange.Rows.Count > 1 Then
            .DataBodyRange.Offset(1).Resize(.DataBodyRange.Rows.Count - 1, .DataBodyRange.Columns.Count).Rows.Delete
            End If
        .DataBodyRange.Rows(1).Cells.ClearContents
        
    End With
Poland's S&P Ratings Downgrade by M.Tyrmand
środa, 20 stycznia 2016
Excel - fast vlookup with Dictionary (MSScriptingRuntime)
poniedziałek, 18 stycznia 2016
Excel VBA to hide and unhide a sheet on entering the right password.
niedziela, 17 stycznia 2016
What is happening in Poland?
Data nomad reporting from Sahara
Another day with numbers comforting me. I am Mistah W.Kurtz in my cave amidst wilderness of big city.
Numbers are from only God. 
Pure truth. Devoid of adjectives and exclamations. Beautiful beyond any girl you would like to love in your dreams. 
Existing beyond death. Will take you to the stars...
Mistah Kurtz contemplates:
A doberman mistreated badly by a former owner tells the new one how happy it is by forming geometrical shapes in the garden with its toys. I contemplate it daily. Thinking about my family incarcerated, crushed and blacklisted by scientific ideology, in the name of the progress of man.
Adjectives of man are forbidden among us, a taboo.
Numbers are from God. Will protect and help calculate the way from the stars.
The desert conforms me.
I am a Beduin nomad on a camel. I eat dates with my coffee and meet other nomads once a year in a wilderness. Like lone sailors meeting mid-ocean and we talk simple geometry of the universe. finding regularities in the clear sky above.
Greetings from a digital nomad from xxi c. desert in Warsaw. Central Europe.
czwartek, 14 stycznia 2016
środa, 13 stycznia 2016
Excel formula to sort into bins.
Function MyBins(MyValue As Single) As String
Application.Volatile
        If SGPValue > 1 Then
               MyBins = "> 100%"
        
       
        ElseIf MyValue <= 1 And MyValue > 0.8 Then
           MyBins = "100%-80%"
        ElseIf MyValue <= 0.8 And MyValue > 0.65 Then
           MyBins = "80%-65%"
        ElseIf MyValue <= 0.65 And MyValue > 0.5 Then
           MyBins = "65%-50%"
        ElseIf MyValue <= 0.5 And MyValue > 0.2 Then
           MyBins = "50%-20%"
        ElseIf MyValue <= 0.2 And MyValue >= 0 Then
           MyBins = "20%-0%"
        ElseIf MyValue < 0 Then
           MyBins = "< 0"
        
       Else
           MyBins = "Please Verify"
           
    End If
 
    
End Function
Without?
Would you use IF for that purpose? And when there are several bins you will have plenty of nested IFs?
The answer is to use LOOKUP. cf.: http://www.mrexcel.com/forum/excel-questions/631008-excel-formula-sort-into-bins.html#post3130377 In this example you can also see how an array, a table, can be encoded in a formula.
I think it is very neat to use ListObject Table to keep the parameters in, then the formula would look like:
=LOOKUP(B3;Table1[a];Table1[b]) where [a] is a column with upper limits and [b] contains the corresponding names of the bins.
Table 1
a b
-99999 < 1
1 1 to 1,9999
2 2 to 4,9999
5 5 to 9,9999
10 10 and avobe
To count bins to generate a histogram it is probably best to use countifs:
http://exceluser.com/formulas/countifs-frequency-distributions.htm
Survey Science: an interesting excel macro collection for researchers.
Plus, an interesting article on outliers' detection: http://fernandonunezserrano.com/Outlier_Detector.pdf
(and in R: http://www.rdatamining.com/examples/outlier-detection)
wtorek, 12 stycznia 2016
R: mosaic plots with labels? Other graph musings.
I looked at these pages as well:
https://ds4ci.files.wordpress.com/2013/09/user08_jimp_custseg_revnov08.pdf
http://tables2graphs.com/doku.php?id=03_descriptive_statistics#figure_2
https://cran.r-project.org/web/packages/vcd/vignettes/residual-shadings.pdf
http://cran.us.r-project.org/web/packages/vcdExtra/vignettes/vcd-tutorial.pdf
http://www.math.ucla.edu/~anderson/rw1001/library/base/html/mosaicplot.html
summary of cat. values http://stackoverflow.com/questions/14235096/data-summary-based-on-multiple-categorical-variables
Clustering segmentation:
http://marketing-yogi.blogspot.com/2012/12/segmentation-tools-in-r-session-5.html
https://ds4ci.files.wordpress.com/2013/09/jimporzak_rfwithr_dmaac_jan07_webinar.pdf
OT: other graphs
http://www.cyclismo.org/tutorial/R/intermediatePlotting.html
http://stats.stackexchange.com/questions/4089/graphical-data-overview-summary-function-in-r?rq=1
http://www.statmethods.net/advstats/cart.html
http://ww2.coastal.edu/kingw/statistics/R-tutorials/graphically.html
Breaking data into bins: http://stackoverflow.com/questions/12979456/r-code-to-categorize-age-into-group-bins-breaks
Aggregating: http://www.statmethods.net/management/aggregate.html
http://davetang.org/muse/2013/05/22/using-aggregate-and-apply-in-r/
http://www.r-statistics.com/2012/01/aggregation-and-restructuring-data-from-r-in-action/
https://www.soa.org/News-and-Publications/Newsletters/Compact/2012/august/The-R-Corner-%E2%80%93-Aggregate-Function.aspx
poniedziałek, 11 stycznia 2016
Early morning R readings: knn imputation of missing values
czwartek, 7 stycznia 2016
Excel VBA: changing value calculates discount, changing discount calculates value.
Private oListObj As ListObject
      
'Private Sub Worksheet_Activate()
'    ActiveSheet.Protect "pwd", AllowFiltering:=True
'End Sub
    Private Sub Worksheet_BeforeDoubleClick(ByVal Target As Range, Cancel As Boolean)
      
    Set oListObj = Worksheets("Quotation").ListObjects("tblProForma")
      
    Application.EnableEvents = True
      
        If Not Intersect(Target, oListObj.ListColumns("Price").DataBodyRange) Is Nothing Then
        Application.EnableEvents = False
        Application.AutoCorrect.AutoFillFormulasInLists = False
        Target.Formula = Target.Value
        Application.EnableEvents = True
        End If
          
        If Not Intersect(Target, oListObj.ListColumns("Discount").DataBodyRange) Is Nothing Then
        Application.EnableEvents = False
        Application.AutoCorrect.AutoFillFormulasInLists = False
        Target.Formula = Round(Target.Value, 5)
        Application.EnableEvents = True
        End If
    End Sub
      
    Private Sub Worksheet_Change(ByVal Target As Range)
      
    Dim PriceDiscountOffset As Integer: PriceDiscountOffset = ActiveSheet.Range("tblProForma[[#All],[Price]:[Discount]]").Columns.Count - 1
          
    Set oListObj = Worksheets("Quotation").ListObjects("tblProForma")
      
    Application.EnableEvents = True
    
    '=============
    'This code prevents macro from ruining table if user resizes it.
    '=============http://www.mrexcel.com/forum/excel-questions/844654-can-i-tell-excel-run-macro-every-time-i-manually-resize-listobject-excel-table.html
    Dim sLastUndoStackItem  As String
  
        On Error Resume Next
        sLastUndoStackItem = Application.CommandBars("Standard").FindControl(ID:=128).List(1)
        On Error GoTo 0
    
     If sLastUndoStackItem = "Table Resize" Then
       Exit Sub
     End If
    
    '=============
    
      
        If Not Intersect(Target, oListObj.ListColumns("Price").DataBodyRange) Is Nothing Then
        Application.EnableEvents = False
        Application.AutoCorrect.AutoFillFormulasInLists = False
        
        
        Target.Offset(0, PriceDiscountOffset).Formula = "=IF([@[Price]]<>"""", -([@[Price]]-[@[Pricelist]])/[@[Price]],"""")"
        Application.EnableEvents = True
        End If
          
        If Not Intersect(Target, oListObj.ListColumns("Discount").DataBodyRange) Is Nothing Then
        Application.EnableEvents = False
        
             
        Application.AutoCorrect.AutoFillFormulasInLists = False
        Target.Offset(0, -PriceDiscountOffset).Formula = "=[@[Pricelist]]-([@[Pricelist]]*[@[Discount]])"
        Application.EnableEvents = True
        End If
      
    End Sub
wtorek, 5 stycznia 2016
Coffee break. Trees, some reading and stupidly playing with tree clusters...
http://www.r-bloggers.com/package-party-conditional-inference-trees/
http://www.statmethods.net/advstats/cart.html
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/hclust.html
Played like that:
> drzewko<-ctree(mojedane$y~.,data = mojedane[,c(1:3)])
> predict(drzewko, type = "response") 
 
R: quick view at data - correlogram, margin plot etc.
Quick get it from excel
 
> mojedane<-read.table(file="clipboard",dec = ",",header=TRUE) 
 
Then, after: http://www.statmethods.net/advgraphs/correlograms.html 
 
> library(corrgram) 
 
> corrgram(mojedane, order=TRUE, lower.panel=panel.ellipse, + upper.panel=panel.pie, text.panel=panel.txt, + diag.panel=panel.minmax, + main="Korelacje - moje dane")
 
 
To see where the missing data goes.
 
http://www.inside-r.org/packages/cran/VIM/docs/marginplot
 
> library("VIM")
> mojedane<-read.table(file="clipboard",dec = ",",header=TRUE,row.names = 1) > marginplot(mojedane[,c(1,2)])
R: xts and zoo package for missing data; two plots
Useful and basic entry into the topic:
http://nicercode.github.io/intro/missing-data.html and
http://thomasleeper.com/Rcourse/Tutorials/NAhandling.html
http://www.ats.ucla.edu/stat/r/faq/missing.htm
==Fill NA==
A very interesting reading on the use of zoo/xts package here (page 1 and the following): http://publish.illinois.edu/spencer-guerrero/2014/12/11/2-dealing-with-missing-data-in-r-omit-approx-or-spline-part-1/
My dirty notebook:
> is.na (
> mean(Data&Variable1, na.rm=TRUE)
Remove incomplete
> na.omit(Data$Variable1)
> complete.cases(Data)
> complete.data <- complete.cases(Data)
Fill continuous/time series data
> require (xts)
> prices <-na.locf(prices)
> prices <-na.fill(prices, fill="extend") are there other fills?
check out na.approx, na.spline!!!
na.contiguous (to find longest contiguous na's?)
For plotting
na.fill etc functions turn data frame into atomic vectors...
It means that if we plot, we cannot refer to column names but to column numbers.
Plot several plots:
> par(mfrow=(c(2,1))
> plot(prices[,1], type='l')
> plot(prices[,2], type='l')
====
For correlation
cor ()... use the argument use='pairwise', then only the pairs with na will be ommited and not the whole observations.
====
Serious approach: imputation:
http://www.stat.columbia.edu/~gelman/arm/missing.pdf
e.g.: http://stackoverflow.com/questions/11990463/match-fitted-values-from-lm-with-a-data-frame-in-case-of-na-values
https://stat.ethz.ch/pipermail/r-help/2002-January/017756.html
Good article for a study: http://www.r-bloggers.com/imputing-missing-data-with-r-mice-package/
http://research.stowers-institute.org/efg/R/Statistics/loess.htm
http://www.biostat.jhsph.edu/~iruczins/teaching/jf/ch14.pdf
http://www.ats.ucla.edu/stat/r/faq/R_pmm_mi.htm
http://www.stefvanbuuren.nl/publications/MICE%20in%20R%20-%20Draft.pdf
http://stackoverflow.com/questions/2613420/handling-missing-incomplete-data-in-r-is-there-function-to-mask-but-not-remove
poniedziałek, 4 stycznia 2016
Maps of Poland in R
http://www.gauss.pl/analityk/drupal/node/43
http://www.kep.uni.lodz.pl/em/materialy/BANKI/BANKI_DANYCH_cw_10.doc
Most comprehensive:
http://quantup.pl/2015/03/13/analiza-danych/wizualizacja-danych-na-mapach-w-r/
http://www.goldenline.pl/grupy/Komputery_Internet/r/mapa-polski,1523596/
http://www.gis-support.pl/baza-wiedzy/dane/dane-do-pobrania/
http://www.wais.kamil.rzeszow.pl/pakiet-r-openpoland-tutorial/
In Excel
http://www.apocotenexcel.pl/mapa2.htm
sobota, 2 stycznia 2016
R: Just discovered Swirl, interactive courses in R console.
> install_from_swirl("Regression_Models", dev = FALSE, mirror = "github")
This night I completed R programming course, an intro.
Some of my unordered notes, useful perhaps only to myself:
Working directory:
list.files() or dir()
> args(list.files)
> dir.create("testdir")
> file.create("mytest.R")
> file.exists("mytest.R")
> file.rename("mytest.R","mytest2.R")
> file.copy("mytest2.R","mytest3.R")
> file.path("mytest3.R") - good to construct relative, independent of system, paths in a procedure.
> file.path("folder1","folder2")
> dir.create(file.path("testdir2","testdir3"),recursive = TRUE) - creating dir and subdir in one shot
> unlink("testdir2", recursive=TRUE) - deleting directory with all its content, note use Recursive
> unlink("testdir", recursive=TRUE)
Sequences of numbers along some vector:
> seq(along.with = my_seq)
> seq_along(my_seq)
> rep(c(0,1,2),times = 40)
> rep(c(0,1,2), each = 10)
> file.info("mytest.R")$mode --- to grab specific info
Vectors
> paste(my_char, collapse = " ") - collapsing string vector to one string.
> paste("Hello", "world!", sep=" ")
> paste(1:3,c("X","Y","Z"), sep="")
[1] "1X" "2Y" "3Z"
> paste(LETTERS, 1:4, sep="-")
[1] "A-1" "B-2" "C-3" "D-4" "E-1" "F-2" "G-3" "H-4" "I-1" "J-2" "K-3" "L-4" "M-1" "N-2" "O-3" "P-4" "Q-1" "R-2" "S-3" "T-4" "U-1" "V-2" "W-3" "X-4" "Y-1"
[26] "Z-2"
indexing
> x[-c(2,10)] exclude these numbers, use - once before concatenate
==data.frame==
> colnames(my_data)<-cnames - giving names to columns
> which(ints>7) Which are less then 7?
[1] 1 3 5
> any(ints<0) are there any lesser than 7
[1] FALSE
> all(ints>0)
[1] TRUE
functions
ellipsis arguments...
mad_libs <- function(...){
# Do your argument unpacking here!
args <- list(...)
place <- args[["place"]]
adjective <- args[["adjective"]]
noun <- args[["noun"]]
paste("News from", place, "today where", adjective, "students took to the streets in protest of the new", noun, "being installed on campus.")
}
your own operators, eg. add texts to each other.
"%p%" <- function(left,right){
paste(left,right)
}
> lapply(unique_vals, function(elem) elem[2]) - one liner function
Table apply?
> table(flags$animate) - count flags which are animate :-)
> tapply(flags$animate, flags$landmass, mean) - get the proportion of animate elements in each landmass.
> tapply(flags$population, flags$red, summary) - get the population summary for red in the flag.
> tapply(flags$population, flags$landmass, summary) - get the population summary for each landmass
Aquaint with data
> summary(plants)
if categorical data gets truncated and I need to analyse it:
> table (plants$Active_Growth_Period)
But the most concise info on data one gets via
> str(plants)
Simulation:
Dice throwing 4 times:
> sample(1:6,4, replace = TRUE)
Unfair coin flips:
> flips<- sample(c(0,1),100,replace=TRUE, prob=c(0.3,0.7))
> rbinom(1,size=100, prob=0.7) - number of heads
> rbinom(100, size=1, prob=0.7) - set of all observations
Each probability distribution in R has an
r*** function (for "random"), a d*** function (for "density"), a p*** (for "probability"), and q*** (for
| "quantile").
Time:
> str(unclass(t2)) to have a more compact view of all components of a structure.
> t4<- strptime(t3, "%B %d, %Y %H:%M") - check it out, it converts txt dates into POSIXlt
> difftime(Sys.time(),t1, units='days') time difference in days
Plot:
> plot(cars, xlim = c(10,15)) - limit x to 10..15 range
> plot(cars, pch = 2) - change tick symbols (here to triangles)
> boxplot(mpg ~ cyl, data = mtcars) y mpg to x cyl from data = mtcars
> hist(mtcars$mpg)
 
