Episodes

Saturday Oct 27, 2018
Episode 2: Data Wrangling: Why you gotta do what you gotta do
Saturday Oct 27, 2018
Saturday Oct 27, 2018
The common complaint about data science is that 90% of your time is spent data wrangling. In this episode, I talk about some history that leads to this current state of data science work, and why you should embrace this. I also give some resources that will help you with your data wrangling at the raw level.
R Packages and Tools mentioned in this episode:
R:
| Package | Description |
| lubridate | Handing dates, datetimes, intervals, durations |
| readr | Reading in CSV and related textual files |
| readxl | Reading in Excel files |
| jsonlite | Reading, writing and manipulating JSON structures |
| httr | Reading HTML and extracting parts programatically |
| dplyr + purr | Simple grammar for common data manipulations |
Command line tools:
| Utility | Description |
| head | Show first few lines of a text file |
| less [-S] | Pager to make sure data you look at doesn't scroll off the screen |
| wc | Count lines, words, and characters in a file |
| csvlook | Python package that helps format and manipulate CSV files from command line |
Version: 20230822


Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.