Titanic data: https://www.kaggle.com/c/titanic/data
and tutorial: http://nbviewer.jupyter.org/github/savarin/pyconuk-introtutorial/blob/master/notebooks/Section%201-0%20-%20First%20Cut.ipynb
Flights data: http://ucl.ac.uk/~uctqiax/data/flights.csv
Software used:
Portable scientific winpython (with pandas scikit-learn):
https://sourceforge.net/projects/winpython/?source=typ_redirect
To work it needed windows updates (my OS is windows 7):
https://www.microsoft.com/en-us/download/confirmation.aspx?id=49093
To install packages from source it needed:
http://landinghub.visualstudio.com/visual-cpp-build-tools
I needed feather package so I dowloaded it and used python command: pip install
as taught here: https://github.com/winpython/winpython/wiki/Installing-Additional-Packages
and installed from source: https://github.com/wesm/feather/tree/master/python
To learn how to use other languages in RStudio: http://rmarkdown.rstudio.com/authoring_knitr_engines.html
I also wanted to try if some portable version of bash would work. No problem:
http://win-bash.sourceforge.net/
Code for my playground.
--- title: "R Notebook" output: html_notebook --- ## Bash ```{bash, engine.path="C:\\Users\\jkotows2\\Desktop\\shell.w32-ix86\\bash.exe"} cat flights1.csv flights2.csv flights3.csv > flights.csv ``` ## Python http://rmarkdown.rstudio.com/authoring_knitr_engines.html ```{python, engine.path="C:\\Users\\jkotows2\\Desktop\\WinPython\\python-3.6.0.amd64\\python.exe"} import pandas import feather # Read flights data and select flights to O'Hare flights = pandas.read_csv("C:\\Users\\jkotows2\\Desktop\\_flights\\flights.csv") flights = flights[flights['dest'] == "ORD"] # Select carrier and delay columns and drop rows with missing values flights = flights[['carrier', 'dep_delay', 'arr_delay']] flights = flights.dropna() print (flights.head(10)) # Write to feather file for reading from R feather.write_dataframe(flights, "C:\\Users\\jkotows2\\Desktop\\_flights\\flights.feather") ``` ## Back to R ```{r} library(feather) library(ggplot2) # Read from feather and plot flights <- read_feather("C:\\Users\\jkotows2\\Desktop\\_flights\\flights.feather") ggplot(flights, aes(carrier, arr_delay)) + geom_point() + geom_jitter() ```
Brak komentarzy:
Prześlij komentarz