Episodes
Saturday Oct 27, 2018
Episode 2: Data Wrangling: Why you gotta do what you gotta do
Saturday Oct 27, 2018
Saturday Oct 27, 2018
The common complaint about data science is that 90% of your time is spent data wrangling. In this episode, I talk about some history that leads to this current state of data science work, and why you should embrace this. I also give some resources that will help you with your data wrangling at the raw level.
R Packages and Tools mentioned in this episode:
R:
Package | Description |
lubridate | Handing dates, datetimes, intervals, durations |
readr | Reading in CSV and related textual files |
readxl | Reading in Excel files |
jsonlite | Reading, writing and manipulating JSON structures |
httr | Reading HTML and extracting parts programatically |
dplyr + purr | Simple grammar for common data manipulations |
Command line tools:
Utility | Description |
head | Show first few lines of a text file |
less [-S] | Pager to make sure data you look at doesn't scroll off the screen |
wc | Count lines, words, and characters in a file |
csvlook | Python package that helps format and manipulate CSV files from command line |
Version: 20240731
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.