Episode 2: Data Wrangling: Why you gotta do what you gotta do

The common complaint about data science is that 90% of your time is spent data wrangling.  In this episode, I talk about some history that leads to this current state of data science work, and why you should embrace this.  I also give some resources that will help you with your data wrangling at the raw level.

 

R Packages and Tools mentioned in this episode:

 

R:

 

Package Description
lubridate Handing dates, datetimes, intervals, durations
readr Reading in CSV and related textual files
readxl Reading in Excel files
jsonlite Reading, writing and manipulating JSON structures
httr Reading HTML and extracting parts programatically
dplyr + purr Simple grammar for common data manipulations

 

Command line tools:

 

Utility Description
head Show first few lines of a text file
less [-S] Pager to make sure data you look at doesn't scroll off the screen
wc Count lines, words, and characters in a file
csvlook Python package that helps format and manipulate CSV files from command line

 

 

Share | Download(Loading)

Episodes Date

Load more