czwartek, 16 lutego 2017

Python in RStudio

Data used:
Titanic data: https://www.kaggle.com/c/titanic/data
and tutorial:  http://nbviewer.jupyter.org/github/savarin/pyconuk-introtutorial/blob/master/notebooks/Section%201-0%20-%20First%20Cut.ipynb

Flights data: http://ucl.ac.uk/~uctqiax/data/flights.csv

Software used:
Portable scientific winpython (with pandas scikit-learn):
https://sourceforge.net/projects/winpython/?source=typ_redirect
To work it needed windows updates (my OS is windows 7):
https://www.microsoft.com/en-us/download/confirmation.aspx?id=49093

To install packages from source it needed:
http://landinghub.visualstudio.com/visual-cpp-build-tools
I needed feather package so I dowloaded it and used python command: pip install
as taught here: https://github.com/winpython/winpython/wiki/Installing-Additional-Packages
and installed from source: https://github.com/wesm/feather/tree/master/python

To learn how to use other languages in RStudio: http://rmarkdown.rstudio.com/authoring_knitr_engines.html

I also wanted to try if some portable version of bash would work. No problem:
http://win-bash.sourceforge.net/



Code for my playground.
---
title: "R Notebook"
output: html_notebook
---

## Bash

```{bash, engine.path="C:\\Users\\jkotows2\\Desktop\\shell.w32-ix86\\bash.exe"}
cat flights1.csv flights2.csv flights3.csv > flights.csv
```

## Python

http://rmarkdown.rstudio.com/authoring_knitr_engines.html

```{python, engine.path="C:\\Users\\jkotows2\\Desktop\\WinPython\\python-3.6.0.amd64\\python.exe"}
import pandas
import feather

# Read flights data and select flights to O'Hare
flights = pandas.read_csv("C:\\Users\\jkotows2\\Desktop\\_flights\\flights.csv")
flights = flights[flights['dest'] == "ORD"]

# Select carrier and delay columns and drop rows with missing values
flights = flights[['carrier', 'dep_delay', 'arr_delay']]
flights = flights.dropna()
print (flights.head(10))

# Write to feather file for reading from R
feather.write_dataframe(flights, "C:\\Users\\jkotows2\\Desktop\\_flights\\flights.feather")
```

## Back to R

```{r}
library(feather)
library(ggplot2)

# Read from feather and plot
flights <- read_feather("C:\\Users\\jkotows2\\Desktop\\_flights\\flights.feather")
ggplot(flights, aes(carrier, arr_delay)) + geom_point() + geom_jitter()
```

Brak komentarzy:

Prześlij komentarz