czwartek, 17 grudnia 2015

CoffeeBreak: Outliers, Kernel Density

Reading on using IQR to identify outliers in Excel:
http://datapigtechnologies.com/blog/index.php/highlighting-outliers-in-your-data-with-the-tukey-method/
http://brownmath.com/stat/nchkxl.htm
esp. see worksheet: http://brownmath.com/stat/prog/normalitycheck.xlsm

Using SD, not a good idea but a good macro to start with:
http://www.mrexcel.com/forum/excel-questions/732424-how-remove-outliers-data-set-2.html
Sub outliers_mod2()
Dim dblAverage As Double, dblStdDev As Double
Dim NoStdDevs As Integer
Dim rTest As Range, Rng As Range
'Application.ScreenUpdating = False
NoStdDevs = 3 'adjust to your outlier preference of sigma
Set rTest = Selection 'Application.InputBox("Select a range", "Get Range", Type:=8)
dblAverage = WorksheetFunction.Average(rTest)
dblStdDev = WorksheetFunction.StDev(rTest)
For Each Rng In rTest
    If Rng > dblAverage + NoStdDevs * dblStdDev Or Rng < dblAverage - NoStdDevs * dblStdDev Then
        Rng.Interior.Color = RGB(255, 0, 0)        '.Value = "Outlier" 'or delete the data with Rng.clearcontents
    End If
Next
'Application.ScreenUpdating = True
End Sub 
 
====
Normalize(simple linear normalize) data in Excel
http://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range?newreg=f530ce192d144b109ec26a077cab00af 
 
Kernel Density/Regression, alternatives to histogram.
==== 
good article: http://www.stat-d.si/mz/mz4.1/vidmar.pdf
http://people.revoledu.com/kardi/tutorial/index.html
2d: http://www.r-bloggers.com/recipe-for-computing-and-sampling-multivariate-kernel-density-estimates-and-plotting-contours-for-2d-kdes/
 
Density plugin: http://www.prodomosua.eu/ppage02.html 

Good  VBA example for density plots (incl UDF function). 
http://www.iimahd.ernet.in/~jrvarma/software.php
(found in this page: http://www.mathfinance.cn/category/vba/1/5/)
 
Some R code http://www.wessa.net/rwasp_density.wasp#output
Another article: http://www.rsc.org/images/data-distributions-kernel-density-technical-brief-4_tcm18-214836.pdf

 
Some plugin/vba to check: http://www.rsc.org/Membership/Networking/InterestGroups/Analytical/AMC/Software/RobustStatistics.asp

Cheatsheets for R: distributions: http://www.r-bloggers.com/ggplot2-cheatsheet-for-visualizing-distributions/

Good Wikipedia entry: https://en.wikipedia.org/wiki/Outlier
Advanced article: http://d-scholarship.pitt.edu/7948/1/Seo.pdf
For beginners: https://www.dataz.io/display/Public/2013/03/20/Describing+Data%3A+Why+median+and+IQR+are+often+better+than+mean+and+standard+deviation

====
Simple MAD solution with Excel formulas: http://www.codeproject.com/Tips/214330/Statistical-Outliers-detection

Brak komentarzy:

Prześlij komentarz